Micro-Doppler-Based Space Target Recognition with a One-Dimensional Parallel Network

Space target identiﬁcation is key to missile defense. Micromotion, as an inherent attribute of the target, can be used as the theoretical basis for target recognition. Meanwhile, time-varying micro-Doppler (m-D) frequency shifts induce frequency modulations on the target echo, which can be referred to as the m-D eﬀect. m-D features are widely used in space target recognition as it can reﬂect the physical attributes of the space targets. However, the traditional recognition method requires human participation, which often leads to misjudgment. In this paper, an intelligent recognition method for space target micromotion is proposed. First, accurate and suitable models of warhead and decoy are derived, and then the m-D formulae are oﬀered. Moreover, we present a deep-learning (DL) model composed of a one-dimensional parallel structure and long short-term memory (LSTM). Then, we utilize this DL model to recognize time-frequency distribution (TFD) of diﬀerent targets. Finally, simulations are performed to validate the eﬀectiveness of the proposed method.


Introduction
Space target defense is a fundamental aspect in modern air defense operations [1,2]. Currently, target recognition technology is rapidly developing. Micromotion, as an intrinsic attribute, can be used as the theoretical basis for ballistic target feature extraction and recognition [3] and has attracted extensive attention in the field of target recognition. As the inherent feature of a moving target, it is wellknown that the micromotion feature can be used to describe weak kinematic features. us, m-D features can be used as the theoretical basis for target recognition [4,5]. However, multitargets existing in midcourse is a complex challenge in space target identification due to different motion forms [6]. Chen et al. established a model of common micro forms in [7] and realized the unification of the model. In [8], a micromodel closer to the real situation is constructed by using high-frequency electromagnetic calculation, and the echo data of precession cone target and human body are obtained. According to the theory of electromagnetic scattering, the electromagnetic characteristics of the space target are approximately equal to the sum of some scattering centers at high frequencies. In [9], the scattering center is divided into localized scattering centers (LSC), distributed scattering centers (DSC), sliding-type scattering centers generated by edge diffraction (SSCE), and sliding-type scattering centers on the space target curved surface (SSCS).
Many studies have been conducted on the m-D recognition of space targets by using the Cadence Velocity Diagram (CVD), time-frequency spectrograms, ISAR image, and other traditional methods. For instance, [10] proposed three technical frameworks of the m-D classification based on the Cadence Velocity Diagram (CVD). e genetic algorithm-general parameterized time-frequency transform (GA-GPTF) method has been proposed to accurately estimate the parameters of m-D [11]. Kei Suwa uses ISAR movie images to extract three-dimensional structural features of a target [12]. However, traditional space target recognition requires manual participation, and the recognition result depends on manual judgment.
In recent years, deep learning (DL) has been widely used in various fields [13,14]. Note that target recognition is also inspired by the DL techniques. Kim and Moon [15] creatively used convolutional neural network to classify the micro-Doppler spectra of different targets. [16] used the DL network to recognize high resolution range profile (HPPR) images of different targets, and the recognition accuracy reached more than 90%. In [17], a new all-convolutional network was proposed to reduce the number of free parameters in the intelligent target recognition of SAR images. Compared with other images, time-frequency spectrograms have a strong temporal correlation, which can better reflect the m-D effect of space targets. Hence, in this paper, we propose a network structure that utilizes time-frequency spectrograms to classify space targets. e remainder of this paper is organized as follows. e mode of warhead and decoy is introduced in Section 2. e deep learning network design of the proposed method is described in details in Section 3. e experiment verifies the accuracy of the method in Section 4. Finally, in Section 5, we present the conclusions.

Model of Warhead and Decoy
e warhead moves in the form of procession or nutation as it is affected by spin and lateral disturbance. is is because the imitation decoy has no attitude-adjusting device, which would enable it to move by rotation or vibration [18]. Figure 1, the radar coordinate system (O 1 − UVW) and target coordinate system (O − XYZ) are the left-hand coordinate system. According to the theory of scattering center, Point P can be considered as LSC.

Decoy Mode. As shown in
For the decoy rotating, the target rotation axis is a straight line passing through the point O; the azimuth and elevation angles of the target coordinate system are α and β, respectively; the azimuth and elevation angles between the radar Light Of Sight (LOS) and O − XYZ are α ′ and β ′ , respectively. e distance between O and O 1 is R 0 , e distance between P and O 1 is R ′ . According to [7], at time t, where r � (cos α ′ cos β ′ , sin α ′ cos β ′ , sin β ′ ) T , ω r is rotation angular velocity, and a is the matrix related to the angle parameter of the rotation axis. e following expression describes a.
In equation (1), the micromotion of the rotating target conforms to the law of sinusoidal modulation, which is affected by factors such as the radar line of sight and the target rotation attribute.
For a decoy vibrating, A V is the amplitude of vibration, and ω v is the vibration angular velocity. Vector representation of vibration direction and LOS direction in the target coordinate system can be described as , respectively. At time t, the distance between P and O 1 is R v ′ (t), which can be described as From equation (3), the distance from the vibrating target to the radar is modulated by sinusoid. Figure 2(a), the mass of the warhead is defined as the coordinate system origin. Spin angular velocity of the target is ω s , cone rotation angular velocity is ω c , and the azimuth and elevation angles of the target symmetry axis are α and c, respectively. e angle between the LOS and O − Z axis is θ 0 . In particular, point A is fixed on the top of the warhead, which can be assumed as a localized scattering center. Points B and C are located at the edge of warhead, that is, the position of points B and C slides with the movement of the target [9], which can be considered as SSCE. According to [19], the m-D distance of each scattering point can be expressed as follows: where t is the time variable, r is the radius of the target bottom, h 1 and h 2 are the distances from the top of the cone to the center of the mass and from the center of the mass to the center of the bottom, respectively. cos β(t) can be expressed as follows:

Warhead Mode. In
According to equation (4), the m-D of the sliding scattering point does not conform to the sinusoidal modulation law. e warhead moving in the form of a nutation is actually adding to the swing based on precession. According to the derivation, the nutation m-D distance of each scattering point can also be expressed as equation (4). e only difference is that the expression of cos β(t) is modulated by the swing parameter. Assuming that the amplitude is θ b , ϕ b denotes the elevation angle at the initial time, and ω b is the swing angular velocity.
us, we can rewrite cos β(t) as follows: Moreover, space targets have a special streamlined structure, as shown in Figure 2(b). Point D can be considered as the SSCS. e angle between the LOS and O − X axis is ε. a and b are the lengths of the major and minor axis of the streamlined structure, respectively. e coordinates of point D r � (x t , y t , z t ) can be expressed as follows: · −sin θ 0 sin ω c t , · sin θ 0 cos c cos ω c t + cos θ 0 sin c , Vector representation of LOS direction can be described as follows: e m-D distance of SSCE can be expressed as follows: where R rot (t) is the transfer matrix of degree θ rotation around the Z axis. R rot (t) can be deduced by the following: e m-D characteristics of SSCS do not conform to the sinusoidal modulation law. Figure 3 shows the overall network architecture for space target recognition, which consists of a time-frequency transform, 1-D parallel structures for local feature learning, an LSTM layer for global temporal information extraction, and softmax for classification. For traditional image recognition, because the image is 2-D, the convolution layer in the network usually adopts a 2-D structure. In this paper, considering the temporal correlation of images, we treated time-frequency spectrograms as multiple channels (time dimension) of a 1-D (frequency dimension) image.

Parallel Structure.
e development of deep learning (DL) has led to many breakthroughs in the field of target recognition. Unlike the traditional artificial target recognition, the intelligent target recognition based on DL can realize moderate feature extraction.
As a DL method, the convolutional neural network (CNN) transforms the original data into a more abstract expression through a simple nonlinear model. Many International Journal of Antennas and Propagation 3 scholars have designed different network structures based on CNN, such as Alexnet [20], VGG-16 [21], and VGG-19 [22]. Unfortunately, these networks exhibit a type of deep frame structure. e convolution layers of these networks are linearly connected; hence, only one convolution layer can extract sole features simultaneously.
To extract different features, we propose to introduce the 1-D Parallel structure in the proposed architecture.
e parallel structure is shown in Figure 4. In Figure 4, the parallel structure uses three different types of 1-D convolution kernels, namely, 3 × K, 5 × K, and 7 × K convolution kernels, in which K represents the kernel width. We introduce a design strategy for neural network architecture based on the fractal theory. To reduce the computation and expand the network depth, we take a 3 × K convolution layer as the initial layer, connect the two initial layers, and then use the join operation to merge the connected structure with the 5 × K convolution layer to form the second layer framework. And then, we connect the two second layer frameworks and use the join operation to merge the connected structure with the 7 × K convolution kernels. In this structure, one 5 × K max pooling is also performed. For brevity, the activation function of ReLU is not enumerated.
Instead of the sole features of the independent convolution kernel learning, the parallel structure can automatically extract different features and is capable of some complex tasks. Different processing branches adapt to different convolution kernels for the module to capture the features of different scales. In this way, gratifying results can be obtained in space target recognition.

LSTM.
As the traditional recognition method utilizes the envelope information of the time-frequency spectrogram without considering the temporal correction between the frequency cells, we adopted the Long Short-Term Memory (LSTM) model to process the spectrogram.
LSTM is an efficient time series processing unit and has been widely applied [23,24]; it can handle time series by learning the long-term dependence information between the time steps of the sequence data. e structure of the LSTM is illustrated in Figure 5. e core of the LSTM is the cell state, which runs through the whole cell. Information can be deleted or added to the state of the cell through a structure called the gate. e LSTM includes a forget gate, an input gate, and an output gate. e forget gate can determine what information should be discarded in the cell state. e input gate uses the sigmoid function to judge what new information would be useful for the cell state; the tanh function is used to obtain new candidate cell information. e output gate multiplies the information filtered by the sigmoid function, and a vector between −1 and 1 is obtained by the tanh layer to obtain the output of the structure.
After the 1-D parallel structure process, the output can be regarded as the frequency feature vectors arranged in the time dimension; therefore, we can utilize the LSTM to learn  Figure 3: Schematic diagram of the proposed method.

Simulation Results
In this section, the system implementation details and the performance analysis are introduced primarily. en, the identification of the performance of different network configurations is analyzed.

System Implementation Details and Performance Analysis.
Some radar parameters are listed as follows: the radar transmission carrier frequency is approximately 10 GHz, the pulse repetition frequency is 2000 Hz, the pulse width is 10 μs, and the observation time is 2 s. Some target parameters are listed as follows: Target 1 is a precession cone warhead target; its height is 2.4 m, radius is 0.5 m, and cone rotation angular velocity ω c � 4π rad/s. Target 2 is a nutation cone warhead target; its height is 2 m, radius is 0.5 m, and cone rotation and swing angular velocities are ω c � 4π rad/s and ω b � 10π rad/s, respectively. Target 3 is a rotation decoy target; its height is 2.18 m, radius is 0.52 m, and rotation angular velocity ω r � 8π rad/s. Target 4 is a procession streamlined structure warhead target that has SSCS; its cone rotation angular velocity ω c � 4π rad/s, major and minor axis are 2.5 and 0.5 m, respectively. Target 5 is a vibration decoy target; its height is 1.92 m, radius is 0.48 m, and vibration angular velocity ω v � 10π rad/s. e time-frequency spectrograms of the five targets are illustrated in Figure 6.
Short-time Fourier transform (STFT) is an effective time-frequency transform method. Its basic idea is to use a window function h(t) to take out the signal in a small-time interval, and use FFT to analyze the signal frequency in each time interval. In this paper, we use the window function h(t) to divide time-frequency spectrogram into 153 parts, and then these 153 parts have been fed to 1-D parallel network. e time-frequency spectrograms of the five targets are illustrated in Figure 6.
In Figure 6, the features of the different m-D timefrequency spectrograms are consistent with the theoretical analysis in Section 2. Samples of space targets are often more difficult to obtain. Note that all data were collected by electromagnetic simulation. e sampling interval and the range of each parameter are shown in Table 1. Data acquisition was done under the condition of SNR of −10 dB : 2 dB : 10 dB. e number of all target time-frequency spectrograms was 11 × 1000. In this paper, 70% of the collected time-frequency spectrograms are used as training images, while the remaining 30% are used as test images.
e initial configuration of deep learning recognition network settings is shown in Table 2. For brevity, the ReLU activation function is not enumerated.
For space target recognition, signal-to-noise ratio (SNR) is an important factor that affects the recognition accuracy. erefore, in this paper, we focus on the recognition effect of the network under different SNRs. e concreteness of the results is shown in Table 3. e confusion matrix of the network under −10 dB is illustrated in Figure 7. e rest of the confusion matrix is not displayed here because of space constraints. From Table 2 and the confusion matrixes, we can distinctly find that with the improvement of the SNR, the accuracy of recognition is increasing. When the SNR attains 10 dB, the accuracy is close to 1. As the SNR decreases, it is difficult to identify the m-D of the target time-frequency spectrograms due to noise, and the classification accuracy is reduced. Note that even when the SNR is relatively low, it can still recognize the three motion forms of rotation, vibration, and nutation with high  Pointwise multiplication Pointwise addition To further verify the recognition effect of the network proposed in this paper, we used different state-of-the-art networks on the same data set for an effective comparison.
In this paper, traditional network training parameters are set as follows: the size of the mini-batch for each training iteration was set at 128. e maximum number of epochs was 24. e learning rate was initially set at 0.001, and then it was decreased by a factor of 10 after every 10 epochs. In total, the learning rate was decreased twice. We train the network using the stochastic gradient descent with momentum    Figure 8 shows the root-mean-square error (RMSE) over the epoch of traditional networks in the case of 6 dB, which gives an overview of the training process. As shown in Figure 8, RMSE gradually decreased and stabilized. Variation curves of test recognition accuracy varying with SNR for different networks are illustrated in Figure 9.
As depicted in Figure 9, the recognition accuracy of the network proposed in this paper is higher than that of the traditional network. Furthermore, the classification performance is better, and the robustness is stronger when the SNR is low.
In order to verify the effectiveness of the method, the proposed network method is compared with the classification methods of the support vector machine (SVM) classifier [25] and m-D threshold recognition [26]. e recognition accuracy of the different existing micromotion feature extraction algorithms is shown in Figure 9 when the SNR ranges from −10 dB to 10 dB.
It can be seen from Figure 10 that the recognition accuracy of the network proposed in this paper is the highest. When SNR<0 dB, the recognition accuracy of the traditional classification methods is less than 60%, while the target classification accuracy of the deep learning model is still relatively high, at around 95%. For traditional micromotion recognition methods, in order to obtain a higher accuracy, the SNR generally needs to be set above 6 dB, which establishes that the proposed network has a better anti-noise performance than the traditional algorithm.

Analysis of Network Configurations on Recognition
Performance. In this section, we focus on the influence of network configurations on network performance. Based on the initial network structure, we changed the number of      International Journal of Antennas and Propagation one-dimensional parallel structures or LSTM units in order to observe their effects. As presented in Table 4, when the number of LSTM units is doubled every time from 128 to 512, the principle it follows is that the number of LSTM units (N) is usually half or the same as the kernel width of the previous layer (K). e number of one-dimensional parallel structures is increased from 5 to 11 in an interval of 2. Similarly, we train diverse configured networks under different SNR conditions. e comparison results of the different configured networks are illustrated in Figure 11.
From Figure 11, we can observe that 256 LSTM units display a higher classification accuracy. As far as the parallel structure is concerned, its performance is significantly better when the number reaches 7 or 9. Unfortunately, the error is inevitable owing to the time-frequency spectrograms, which are greatly affected by noise under the condition of low SNR (−10 dB). e recognition accuracy of the network is lower than 0.8, in spite of the network configurations being adjusted.

Conclusions
Aiming at the resolution of space target identification in a ballistic missile defense system, we divide space targets into warheads and decoy targets, according to the movement mode. We proposed a new network based on the parallel structure and LSTM units. In practical applications, the network identified by the time-frequency spectrograms is greatly affected by noise. To accurately evaluate the recognition effect of the proposed network, we obtained the recognition accuracy of the network under different SNR conditions. rough comparison, it was found that the recognition accuracy of the proposed network is better than the traditional networks. More importantly, we optimized the network by searching the number of parallel structures or LSTM units.
It is worth noting that the space target recognition in this paper is implemented on the premise that the group target has been separated, that is to say, the process of group target signal separation is ignored. However, the separation of group target signals is a very complex issue, and the result of this step directly affects the effect of the subsequent target recognition. erefore, the focus of our next research is the separation of the target signals of complex space groups.

Data Availability
e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest.