Recognition of Radar Emitters with Agile Waveform Based on Hybrid Deep Neural Network and Attention Mechanism

,


Introduction
Radar emitter recognition is a key link in radar countermeasures and reconnaissance. It extracts the characteristic parameters and working parameters of the radar emitters on the basis of sorting. Based on these parameters, we can obtain the information such as the system, use, type and platform of the target radar, and further deduce the battlefield situation, threat level, activity rule, tactical intention, etc., and provide important intelligence support for the own decision-making [1].
When the pulse flow density of the radar emitters in the electromagnetic environment space is low, and the conventional characteristic parameters of the radar emitter signals are basically constant, the traditional recognition models can achieve good results. With the increasing complexity of the electromagnetic environment and the continuous development of radar technology, more and more modern digital programmable radars using agile waveform will appear in the future battlefield. The signal forms include frequency agility, variable pulse width and repetition interval conversion, etc. However, it is difficult to effectively identify these radar emitters with complex system only relying on conventional signal characteristics. Therefore, it is urgent to adopt a new recognition model structure to study the recognition of radar emitters with agile waveform to meet the requirements of electronic warfare in the new era.
In the early development of radar emitter recognition technology, due to the relatively simple electronic countermeasure technology and electromagnetic environment, researchers mainly studied the template matching method based on signal characteristic parameters [2]. Its disadvantage is that it has a strong dependence on prior knowledge, and the recognition effect largely depends on the type of radar emitter signals in the database and the quality of the collected parameters. In order to make up for the shortcomings of the template matching method and to deal with the increasingly complex electromagnetic environment, researchers began to add artificial intelligence to the radar emitter recognition technology [3]. Its disadvantage is that it does not start with the characteristics of the radar signal itself for feature extraction and other work, and cannot deal with the complex and changeable radar emitters with new system. With the development of science and technology, the number of radars with new system in modern warfare is increasing. Researchers began to analyze the intrapulse characteristics of radar emitter signals [4]. Its disadvantage is that these methods are limited in signal types. Most of them are aimed at several specific radar emitters, and the influence of noise is not considered. To sum up, most of the existing methods do not focus on the research of radar emitters with agile waveform, which is not in line with the actual use situation. This problem is a difficult problem that must be faced in the current research on radar emitter recognition in electronic countermeasures.
In recent years, with the rise of deep learning technology [5], the field of electronic warfare has also set off a deep learning boom [6][7][8]. Deep learning relies on optimized network structure and training method to achieve models with more layers, stronger expression capabilities, and faster convergence speed, making it possible to process large-scale feature data. On the other hand, deep learning can realize self-learning and self-extraction of features, and verify the effectiveness of feature extraction rules from the data dimension, thereby improving the overall recognition performance.
Inspired by the above ideas, in order to deal with the problem that the conventional characteristic parameters of radar emitter signals with agile waveform are variable, this paper proposes a recognition method of radar emitters with agile waveform based on hybrid deep neural network and attention mechanism. First, we perform a distributed representation of the pulse signal data to generate high-dimensional sparse signal features. Then we design to use a dynamic Convolutional Neural Network to extract features of structural details of radar emitter signals with agile waveform at different levels, and use a Long Short-Term Memory to extract its timing features. In order to obtain the deep features that can characterize the agility of the waveform, the attention mechanism-based method is used to fuse the extracted structural features and timing features, and at the same time it can reduce the influence of noise in complex electromagnetic environment on the characteristic data of radar emitter. Finally, the deep feature is input into the Softmax layer to complete the recognition of radar emitters with agile waveform.
Our contributions are as follows: 1. Regarding the problem of radar emitter recognition, most researchers are mainly focusing on conventional radar emitters, while this article focuses on in-depth research on the characteristics of radar emitters with agile waveform.
2. The hybrid deep neural network constructed in this paper can automatically extract features of different levels and details, and can deal with irregular changes and unknown distribution of radar emitter signals with agile waveform.
3. This paper uses the method based on attention mechanism for feature fusion, which can overcome the influence of noise in complex electromagnetic environment.

Problem Definition
Radar emitters with agile waveform refer to the radar emitters whose signal parameters change rapidly. The parameters mainly include carrier frequency, pulse width, and pulse repetition interval. Therefore, a variety of complex signals such as frequency agility signals, variable pulse width signals, and repetitive interval conversion signals belong to this category. Frequency agility signals can be divided into pulse-to-pulse agility and pulse group agility. Their commonly used types are fixed, jitter, sine, slippage, sweep, random, group, etc. The variable pulse width signals mean that the pulse width of the radar emitter signals is changeable. Usually, they can be divided into multiple signals according to the pulse width. If and only when multiple sub-signals exist, they are considered to exist. The repetitive interval conversion signals refer to the various forms of pulse repetition interval used by radars to distinguish the distance blur or the speed blur or to counter the reconnaissance interference. Their commonly used types are stable, jitter, sine, slippage, jagged, dwell&switch, hopping, etc.
The input of recognition of radar emitters with agile waveform is the pulse sequence obtained after the radar signal is sorted. Each pulse can usually be represented by pulse description words, namely pulse amplitude (PA), carrier frequency (CF), pulse width (PW), pulse repetition interval (PRI) and angle of arrival (AOA). These parameters are calculated from pulses classified as the same radar. The purpose of recognition of radar emitters with agile waveform is to determine the specific type of radar emitters. Because the above-mentioned characteristic parameters usually change, the radar emitter cannot be represented by a certain set of radar pulse description words, that is, the traditional processing method is no longer applicable, but must be represented by multiple groups of radar pulse description words, which increases the difficulty of feature representation and extraction of radar emitters with agile waveform.
Suppose that after the radar signal sorting step, m pulses of a certain radar emitter with agile waveform are obtained. These pulses are arranged together according to the time of arrival (TOA) to form a pulse sequence that can represent the entire radar emitter signal. The data structure of the pulse sequence is PDWSeq = [P 1 ,P 2 ,…,P m ], where the number of pulses m is not fixed. The data structure of the i-th pulse is P i = (pa i , cf i , pw i , pri i , aoa i ), where pa i is the characteristic value of the amplitude of the pulse, cf i is the characteristic value of the carrier frequency of the pulse, pw i is the characteristic value of the width of the pulse, and pri i is the repetition interval of the pulse, aoa i is the characteristic value of the angle of arrival of the pulse.

Recognition of Radar Emitters with Agile Waveform Based on Hybrid Deep Neural Network and Attention Mechanism
In order to analyze and process the pulse sequence to determine the specific category of the radar emitters, the recognition of radar emitters with agile waveform can be divided into four steps: distributed representation of pulse signal data, feature extraction, feature fusion and classification recognition. The specific process is shown in Fig. 1.

Distributed Representation of Pulse Signal Data
In 1986, Hinton et al. introduced the idea of distributed representation to symbolic data. Distributed representation of symbolic data is one of the core ideas of neural network models. Usually after determining the statistical learning model to be used, the quality of the constructed input features will directly determine the performance of the radar emitter recognition system. Inspired by the above idea, this section intends to perform a distributed represen-  tation of pulse signal data, and its goal is to generate vectors (or matrices) of the same length for input into the deep network.
At present, the most commonly used feature representation method in the radar emitter recognition system is to average or simply splice each element P i in the pulse sequence PDWSeq = [P 1 ,P 2 ,…,P m ], so that a fixed-length feature vector can be generated. However, this simple way of feature representation will lose valuable information in the original pulse signal data. If a sparse distributed method is used for feature representation, and P i is converted to HP i = (Hpa i , Hcf i , Hpw i , Hpri i , Haoa i ), where Hpa i , Hcf i , Hpw i , Hpri i and Haoa i are five high-dimensional real number vectors, and their dimensions are all set to 100. Then the Euclidean distance between different values of each feature will be closer, which can retain more valuable information in the pulse signal data. In addition, although the features generated by distributed representation are high-dimensional and sparse, such equal-length high-dimensional sequences are particularly suitable as input to deep networks. The traditional feature representation method and the distributed feature representation method are shown in Fig. 2.

Hierarchical Feature Extraction of Radar Emitters Based on Hybrid Deep Neural Network
Aiming at the variable characteristics of the conventional characteristic parameters of radar emitter signals with agile waveform, a method for extracting the hierarchical features of radar emitters based on hybrid deep neural network is proposed. The input is a pulse sequence HPDWSeq = [HP 1 ,HP 2 ,…,HP m ] that has been distributed and can be used to synthesize information from hundreds or even thousands of pulses.

Use Dynamic CNN Model to Extract Features of Structural Details
Due to the special structure of local weight sharing, the Convolutional Neural Network (CNN) has unique advantages in local feature processing, and its layout is closer to the actual biological neural network, reducing the complexity of the network. In this section, we design to use the dynamic CNN model [9] to convert the distributed features of the pulse signal data into a structural feature vector with a fixed dimension. Compared with the traditional CNN model, the dynamic CNN model contains filter windows of different sizes [10], which can extract features of structural details of radar emitter signals at different levels. The key technical points are wide convolution and dynamic k-max sampling.

Wide Convolution
The input of the dynamic CNN model is the preprocessed distributed feature HPDWSeq = [HP 1 ,HP 2 ,…,HP m ]. The convolution operation refers to the use of filters to extract the local information in the input. The filter f r performs a wide convolution operation on the input HPDWSeq to generate the local feature matrix. 1 2 1 where c i that exceeds the range is set to 0, r is the width of the filter, n is the dimension of each distributed feature, g is a non-linear function, and b is a bias term.
The most commonly used narrow convolution requires that the condition of n  r must be met, that is, the width of the input feature matrix cannot be less than the width of the filter, so the edge information of the input pulse signal data will be ignored, resulting in incomplete structure features extracted. Figure 3 is a schematic diagram of narrow convolution and wide convolution when the filter dimension is 1, and the width r is 3.
Therefore, this section finally chooses wide convolution technology to extract more complete features, which can cover all the information of the input data, including edge information.

Dynamic k-max Sampling
The local feature matrix C r  R (n +r -1)  (n +r -1) extracted by the wide convolutional layer contains a large number of local features, but not all the features are helpful for the radar emitter recognition task. The sampling technique is used to compress the local feature matrix C r , which can avoid a lot of useless calculation and noise in the deep model.
The dynamic parameter k in dynamic k-max sampling means that the largest k values of each row in the matrix C r will be selected, and the k-max pooling score matrix c max  R (n +r -1)  k can be calculated. Its specific form is The arrangement of the k largest values in each row must maintain its original order to preserve the relative position information between different feature values. Different from max sampling, dynamic k-max sampling saves the most significant k features in the local feature matrix C r , and can retain relevant detailed feature information as much as possible, such as the number of occurrences of the maximum value and the relative position of the feature value.
The dynamic in dynamic k-max sampling means that the parameter k will continuously change with the number of layers of the CNN model and the size of the distributed features of the input. The specific calculation method is where M is the number of layers of the CNN model, m is the location of the current dynamic k-max sampling layer, n is the total number of input features, and k top is a fixed value, specifically referring to the value of k in the last sampling layer in the CNN model.
Therefore, this section chooses dynamic k-max sampling technology to flexibly extract features and reduce redundancy as much as possible.

Use Long Short-Term Memory to Extract Timing Features
Long Short-Term Memory (LSTM) [11] is a variant of Recurrent Neural Networks (RNN), which solves the gradient dispersion problem during RNN training, and is very useful for mining long-distance sequence structure information. In this section, we use LSTM to model each pulse signal over a long distance to extract timing features of the pulse signal data, including short-term agility between pulses and long-term agility between batches. The LSTM in this section is divided into the following three layers: distributed feature layer, LSTM layer and output layer. Its structure is shown in Fig. 4.
The distributed feature layer contains the preprocessed distributed feature HPDWSeq = [HP 1 ,HP 2 ,…,HP m ], which can be used as the input of the entire LSTM layer. The output layer contains the timing features obtained by the LSTM layer processing, that is, a timing feature vector with a fixed dimension.
Each LSTM component in the LSTM layer is equivalent to a memory block. The memory block adds three kinds of gates to the hidden layer nodes of RNN, namely the input gate, the forget gate and the output gates, denoted as i t , f t and o t . At the same time, the memory block also contains one or more memory cells. A memory cell is a type of memory that maintains a cell state c t and can retain long-term historical information. Therefore, the data source of each LSTM component mainly includes three parts: the distributed feature vector HP t of the pulse at the current moment, the feedback feature h t -1 or h t + 1 at the adjacent moment, and the stored value c t -1 in the memory cell. The calculation formula of the LSTM component during forward propagation is where  is the logistic function, W is the weight matrix, b is the bias vector, and its subscripts indicate different objects corresponding to the parameters.

Radar Emitter Feature Fusion Based on Attention Mechanism
In order to reduce the influence of noise on the characteristic data of radar emitters in a complex electromagnetic environment and to fuse the above-mentioned hierarchical features, we propose a radar emitter feature fusion method based on attention mechanism. Inputting the above-mentioned hierarchical features into the attention layer of the deep network at the same time can make the essential features in the agile waveform get more attention, that is, to assign greater weight to it, and at the same time to allocate limited attention to the noise signal part, that is, to assign a smaller weight to it. This method can realize the feature fusion of radar emitters with agile waveform, and minimize the interference between signals which are playing a role in a noisy signal environment.
The attention mechanism is a mechanism for distributing attention in the process of simulating the human brain to recognize external things. In the process of human brain cognition, attention is usually only focused on the most critical part, and for the remaining uncritical parts, although the human brain can also receive that part of the information, it only allocates very limited attention to it. Attention mechanism was initially applied to machine vision task by researchers [12], which can significantly improve the performance of image recognition and target detection. Later, Kelvin et al. [13] introduced the attention mechanism into the image-text conversion task, which can effectively convert the form of pictures into the form of text. In recent years, the attention mechanism has begun to be successfully applied to sequence data processing task [14]. Inspired by the above ideas, we intend to use the attention mechanism to process pulse signal data.
The attention mechanism has two main aspects: deciding on the input part that needs attention and allocating limited information processing resources to the important part. The essence of the attention mechanism is weighted summation. When using the attention mechanism for feature fusion, the input is p feature vector f i (i = 1,2,…,p), where p is the total number of feature vectors of structural details and timing feature vectors extracted above. We denote all the feature vectors that need to be calculated as where  is the normalized weight vector,  is the parameter vector, and the final deep feature is s = Att(HPDWSeq; ), and  represents all the above-mentioned adjustable parameters. The processing focus of the attention mechanism is to enable the weight  to be calculated reasonably. The structure of the attention layer is shown in Fig. 5.

Radar Emitter Classification and Recognition
We pass the deep feature vector s obtained by the attention layer to a standard fully connected neural network, and use the softmax layer for probability normalization to generate a conditional probability distribution P(y  HPDWSeq), which represents the conditional probability of belonging to category y under the condition of known pulse sequence HPDWSeq that has been distributed, and a category with the highest conditional probability can be assigned to the pulse sequence, so as to realize the recognition of radar emitters with agile waveform.
We use "softmax + cross entropy" as the cost function, and define a tagging vector T for each HPDWSeq. If a HPDWSeq falls into the i-th type, then the i-th element in the vector T is 1 and other elements are all 0. To train the parameters, we use stochastic gradient descent to optimize the cross entropy errors between Y and T. For each HPDWSeq, we define objective function where  denoted unknown parameters. The pre-training is to minimize the objective function by stochastic gradient descent.
Deep learning models use activation functions to make them have the ability to fit nonlinear data. Frequently used activation functions include Sigmoid, tanh, and ReLU. These activation functions must have three characteristics, namely, non-linearity, monotonicity, and differentiability. By comprehensively weighing and considering the advantages and disadvantages of various activation functions, we use the Rectified Linear Unit (ReLU) func-tion for nonlinear transformation in the above hybrid deep neural network.
The ReLU function is a threshold function about 0, and its definition is as follows: The advantages of the ReLU function are as follows: 1) The output of the ReLU function for negative numbers is 0, and only the output of positive numbers is retained, so that the parameters in the models can be kept sparse, and the efficiency of model training can also be improved; 2) Compared with the activation functions such as Sigmoid and tanh, the ReLU function is more consistent with the original neuron signal from the excitation principle, and can also overcome the problem of gradient dispersion. In addition, the ReLU function can also significantly accelerate the convergence of the stochastic gradient descent algorithm; 3) After applying the ReLU function to the deep learning models, better performance can be achieved without pre-training.

Experimental Analysis
This section first introduces the dataset used of radar emitters with agile waveform, then describes the specific details of the experiment, and finally analyzes the results of the comparison experiment and gives a conclusion.

Dataset of Radar Emitters with Agile Waveform
In order to verify the performance of the method proposed in this paper, we simulated to generate a dataset of radar emitters with agile waveform, including 100 radar emitters, each radar emitter corresponds to 1000 working modes, each mode is a pulse sequence composed of different numbers of pulse description words (PDW). The different number specifically means that the number of pulses available for different radiation sources is different. The characteristic parameters of each pulse of a radar emitter with agile waveform may change. Usually 40 to 200 PDWs can represent a mode of the radar emitter.
The dataset of radar emitters with agile waveform contains a total of 100,000 groups of radar emitter pulse sequences, that is, 100,000 signal samples, which are divided into training set, validation set and test set according to the ratio of 7:1:2. The training set contains 70,000 signal samples, which are mainly used for model training, the validation set contains 10,000 signal samples, which are mainly used for model correction and tuning, and the test set contains 20,000 signal samples, which are mainly used for model performance evaluation.
The simulation parameter setting of the dataset of radar emitters with agile waveform is shown in Tab included 100 radar emitters can be roughly divided into 4 categories, of which the first and second categories are conventional radar emitters, and the third and fourth categories are radar emitters with agile waveform.
The first category refers to radar emitters with fixed characteristic parameters.
The second category refers to radar emitters that use a fixed pulse set, but the order of the pulses will change.
The third category refers to the radar emitters with agility between pulses, which has short-term agility characteristics, and its characteristic parameters will change with each pulse. The parameter values will vary in the same range or mostly overlap, but the parameter change patterns of different radar emitters are significantly different.
The fourth category refers to radar emitters that are agile between dwells. They have long-term agility characteristics. They use the same parameters to transmit a set of pulses, and then transmit the next set of pulses with different characteristic parameters, of which the range has a low overlap.

Implementation Details
This section builds a deep learning development environment of Python3.7+Tensorflow1.15+Cuda10.0 to implement the method of this paper. We use batch gradient descent (BGD) to train the model, and use the recognition accuracy of each category of radar emitters and the overall recognition accuracy of the radar emitters to measure the performance of radar emitter recognition. The calculation formula is as follows: where P i r is the recognition accuracy rate of i-category radar emitters, P r is the overall recognition accuracy rate, N i r is the number of i-category radar emitters accurately recognized, and N i is the total number of i-category radar emitters.
In this section, we use 10-fold cross-validation to determine the hyper-parameters in our model, and get the final recognition result on the validation set. In the end, we identify a set of hyper-parameters that can achieve the highest overall accuracy, which will be used in the following experiments. The filter size of the wide convolution in the dynamic CNN component is set to 5, the fixed parameter k top in the dynamic k-max sampling is set to 3, the dimensions of all hidden layers in the LSTM component are set to 200, and the dimensions of the remaining hidden layers are all set to 100. The dropout operation is performed on hidden layer nodes, the dropout rate [15] is set to 0.5, the minimum batch size is set to 30, and the initial learning rate is set to 0.0005. We use Batch Normalization to process input data, and Max-norm Regularization to process feature parameters.
The following section will analyze the effectiveness of different parts of our method through experiments.
First, in order to evaluate the effectiveness of the distributed representation method (denoted as H-net) proposed in the paper, we compare it with the method of averaging the input pulse sequence (denoted as A-net) and the method of splicing the input pulse sequence (denoted as J-net), and the rest of the model remains unchanged. The experimental results are shown in Tab. 2. The distributed representation method H-net achieved the highest performance. Compared with J-net, its final overall recognition accuracy has increased by 3.07%, indicating that the distributed representation method does help to improve the performance of the model. A-net has the worst performance, with an accuracy rate of only 76.34%, which shows that averaging the input pulse sequence will lose a lot of useful information.
In order to verify the performance of dynamic CNN, and also to verify the effectiveness of the hybrid deep neural network (denoted as DCNN+LSTM) constructed in the article, we combine it with the non-dynamic hybrid deep neural network (denoted as CNN+LSTM) and the hybrid deep convolutional belief neural network (denoted as CNN+DBN) and hybrid deep belief memory neural network (denoted as DBN+LSTM), and the rest of the model remains unchanged. Among them, the non-dynamic normal CNN components use a composite filter strategy, the filter size is 7, 5, and 3, the number of which is 10, 15 and 20 respectively, and the traditional maximum pooling function is used for sampling. The experimental results are shown in Tab. 3. DCNN+LSTM achieves the highest accuracy rate, and is 0.72% higher than CNN+LSTM, which shows that dynamic CNN can perform feature extraction more flexibly, and is beneficial to the improvement of the overall performance of the network. The performance of CNN+DBN and DBN+LSTM is reduced by about 2%-3%, which shows that other types of hybrid deep networks cannot improve the performance of radar emitter recognition. Finally, in order to verify the effectiveness of the attention-based method (denoted as Attention), we compare it with the method of directly connecting dynamic CNN and LSTM using only one fully connected layer (denoted as Direct), and the rest of the model remains unchanged. The experimental results are shown in Tab. 4. The overall recognition accuracy of Attention is 1.21% higher than that of Direct, which shows that the feature fusion method based on the attention mechanism can indeed obtain better deep features.

Comparative Experiment Results
In order to verify the overall superiority of the method in this paper, this experiment constructed the shallow models SVM, NN and the deep models DBN, Autoencoder, CNN, DCNN, LSTM as the baseline systems to compare with the method in this paper. Because the structure of the shallow models is relatively simple, SVM and NN still use the traditional feature representation method, which are recorded as J-SVM and J-NN respectively. Since the effectiveness of the distributed feature representation method has been proved above, DBN, Autoencoder, CNN, DCNN, and LSTM all take the distributed representation features as input, which are respectively denoted as H-DBN, H-Autoencoder, H-CNN, H-DCNN and H-LSTM. We get the recognition accuracy rate of each category of radar emitters, the overall recognition accuracy rate and the running time of the models on the test set. The specific comparative experiment results are shown in Tab. 5, where P 1 r , P 2 r , P 3 r and P 4 r are the recognition accuracy rate of the first category, the second category, the third category and the fourth category of radar emitters respectively.
The experimental results show that compared with other baseline systems, the method in this paper achieves the best overall recognition accuracy rate, which improved by 1.26%. It proves that the feature representation, feature extraction and feature fusion methods used in this paper can obtain multi-level features with different details, and their characterization ability is strong. The shallow models J-SVM and J-NN have the lowest accuracy, which shows that the features obtained by simple models have weak adaptability and are difficult to deal with the complex task of the recognition of radar emitters with agile waveform. Among the remaining deep models, H-LSTM has the highest accuracy rate, reaching 82.45%, indicating that the timing features extracted by LSTM are more important than structural features for radar emitter recognition task. The accuracy of H-DCNN is 1.04% higher than that of H-CNN, indicating that too many or too few filters will have an adverse effect on the recognition results, and the dynamic way can automatically build a more suitable network structure.
The accuracy of all models on the first category of radar emitters is very high, reaching about 95%, mainly because the characteristic parameters of this category of radar emitters are fixed, so they are easy to identify. For the second and third categories of radar emitters, the accuracy of all models is significantly reduced, but the method in this paper has achieved the best performance on both categories of radar emitters. Compared with other models, our accuracy has been significantly improved, reaching 75.22% and 81.22% respectively. The main reason is that the features obtained by the method in this paper can better characterize the information contained in sequence changes and pulse agility. For the fourth category of radar emitters, the method in this paper does not perform well, and only achieves an accuracy of 82.16%, while H-CNN achieves the best performance, and its recognition accuracy reaches 85.01%, indicating that the structural features may be the most important for radar emitters that are agile between dwells.
In terms of running time, J-SVM takes the longest time, up to 86.71 s, which is significantly higher than other neural network models, while J-NN requires the shortest time, only 0.70 s, mainly because its model structure is the simplest, so the required running time is the shortest. The method in this paper combines a variety of different network structures, resulting in a very complex overall model structure, so the required running time has been increased to 1.83 s. In addition, Figure 6 shows the loss curve of our method and other baseline systems. From the figure, it can be seen that our method requires more iterations to achieve convergence, that is, its training time will be longer. Although the method in this paper has a slight extension in running time and training time, it is still within a tolerable range. In summary, the method in this paper is still a more practical solution in terms of the recognition accuracy, running time, and training time.

Conclusion
This paper analyzes the shortcomings of the traditional radar emitter recognition method, and conducts a preliminary discussion on the recognition of radar emitters with agile waveform. We use the powerful function expression ability and feature extraction ability of the deep network structure to apply it to the problem of the recognition of radar emitters with agile waveform. In addition, in order to reduce the influence of noise in the complex electromagnetic environment on the characteristic data of radar emitters, we use an attention mechanism-based method for feature fusion. Experiments have proved the effectiveness of each component of the method in the paper, and our method can achieve good recognition accuracy. The next step will continue to lighten the deep network to speed up the running speed of the model and improve the practicability of our method.