Multi-time frequency analysis and classification of a micro-drone carrying payloads using multistatic radar

: This article presents an analysis of three multi-domain transformations applied to radar data of a micro-drone operating in an open field, with a payload (between 200 and 600 g) and without a payload. Inferring the presence of a drone attempting to transport a payload beyond its normal operating conditions is a key enabler in prospective low altitude airspace security systems. Two scenarios of operation were explored, the first with the drone hovering and the second with the drone flying. Both were accomplished through real experimental trials, undertaken with the multistatic radar, NetRAD. The images generated as a result of the domain transformations were fed into a pretrained convolutional neural network (CNN), known as AlexNet and were treated as a six-class classification problem. Very promising accuracies were obtained, with on average 95.1% for the case of the drone hovering and 96.6% for the case of the drone flying. The activations that these variety of images triggered within the CNN were then visualised to better understand the specific features that the network was learning and distinguishing between, in order to successfully achieve classification.


Introduction
Micro-drones and unmanned aerial vehicles (UAV) have exploded in popularity over recent years and have taken the hobbyist technology market by storm. The success of these systems in global markets owe themselves to manufacturing developments in electronic control sensors such as gyroscopes and accelerometers, chemical compositions of batteries (allowing for longer flight times, higher power delivery and greater lift) and finally more efficient microcomputers with increased computing power to solve and maintain the necessary stabilisation algorithms [1].
These platforms have a wide scope of constructive applications, such as remote inspection, agricultural monitoring, photography or filming, search and rescue, police surveillance, and package delivery [2]. Due to the wide array of uses for these systems a great deal of interest has been generated from various companies, most notably Amazon, who are looking to roll out a fully-fledged delivery service for small parcels, which could fulfil up to 90% of their orders [3]. The predicted global market value for the integration of these systems in the following applications is tremendous and is in the order of $100 billion [4]. Transition to large-scale drone based services therefore appears to be inevitable and has also been claimed to be the next historical revolution [5].
Laws and regulations regarding the use of these platforms have been successfully drawn and passed in the US [6], although many other countries still do not have appropriate laws in place [7]. A bigger issue to this matter is that there is no active means of enforcing such laws, as police do not have the equipment to fully combat or prevent the unauthorised use of drones. There is extraordinary potential to misuse these readily available, easy to use systems for an array of malicious applications, such as: illegal reconnaissance of controlled areas, trafficking of illegal substances, deployment of harmful chemical agents or triggering of mobile improvised explosive devices [8][9][10]. The latter of which is arguably one of the most principal concerns, as it has the greatest capability to inflict the most serious harm, especially in well built up, high populous locations of interest. Consequently, there is a strong incentive to identify, act and successfully prevent the aforementioned dangerous scenarios.
Radar systems are well suited to detect and track such a dynamic and agile threat [11], as it is possible to recognise such targets at very long ranges, through all light and most weather conditions (depending on operating frequency). There are currently systems on the market which are specifically designed for detecting, tracking and classifying drones [12], though these rely on electro-optical (EO) sensors for classification, which suffer at range and in challenging weather conditions. Whilst these EO systems can decide whether the target of interest is for example a bird or drone with good confidence [13], to the best of our knowledge there is currently no implemented algorithm or method for classifying whether the drone is carrying a payload or not. This is in fact a broad topic which remains open for investigation. It would be most preferable to undertake this task throughout the widest array of possible weather conditions, making a radar solution all the more desirable.
In this paper, we present preliminary results of a classification approach based on deep learning and multiple time-frequency representations of the Doppler signature of the drone, namely the spectrogram, the cepstrogram, and the cadence velocity diagram (CVD). The method is applied to classify data of a commercial model of small drone which is carrying payloads of different weights or none, both for the case of hovering and flying movements.
The rest of this paper is organised as follows: Section 2 states the measurement setup and the data collection process, Section 3 discusses the signal processing techniques and examines the preliminary processed spectrums, Section 4 presents the data configuration and the results acquired when the data is applied to the convolutional neural network (CNN) and finally, Section 5 concludes the work and presents areas for future development.

Measurement setup & data collection
The multistatic pulsed radar system, NetRAD was used to obtain the data presented in this paper, it is a coherent pulsed Doppler radar consisting of three separate but identical nodes operating at 2.4 GHz (S Band) [8,14]. The radar parameters used for the experiments were set and fixed to: 45 MHz bandwidth, 0. azimuth and elevation, respectively. H-Pol was chosen as the components of interest (primarily the blades) rotate in horizontal direction and will hence provide a greater return in that plane [15].
The experimental campaign took place in the summer of July 2015 at the UCL sports ground in North London. The three NetRAD nodes were deployed in a linear baseline formation with an inter-nodal separation of 50 m, this is shown in Fig. 1 [16]. The drone used was a DJI Phantom 4 Vision 2+ quadcopter [17], with the camera gimbal removed so that the metal disks of mass 100 g each could be mounted to the bottom of the drone, simulating a payload of a chosen weight. The DJI phantom had an initial weight of 1.2 kg and tests were carried out with the drone hovering and flying, with progressively increasing payloads starting from no payload 0 g, through 200, 300, 400, 500 and 600 g. It should be noted that the drone was not able to fly with 600 g, so that data class had to be withdrawn from the flying data set. There was a total of 18 recordings generated for the hovering class and 15 recordings for the flying class. The Phantom had two blades per rotor and was retrofitted with carbon fibre blades of length 11 cm (per blade), which maximised the reflected energy [15].

Signal processing and analysis
The raw samples were collected from all the radar nodes post-trial and were processed and analysed offline. The raw data was Hilbert transformed and matched filtered against a signal reference bank, increasing the SNR, finally it was then normalised producing the range time intensity (RTI) plot. To automatically process the data, a constant false alarm rate (CFAR) detector was designed and implemented to first identify the range cells in which the drone operated in and secondly to determine whether the drone was flying or hovering. Although this information is already known through the recording labels, a complete framework was designed to process the RTI plots reliably without human intervention to prove the feasibility of a complete detection, tracking and classification scenario [18].
After the ranges of interest had been identified by the CFAR, a short time Fourier transform (STFT), given by (1) was applied over these range cells to generate the double-sided spectrogram. A Hamming window was used as the discrete window function w, k is the discrete frequency set and can be converted to the Doppler frequency f through (2), where N is the number of samples within the short time window and f s is the sampling frequency of the radar which after pre-processing is equal to the operating PRF of the radar which is 5 kHz To convert between the Doppler frequency and velocity, (3) can be used, where λ is the operating wavelength of the radar. To obtain the angular rotation rate of the propellers, (4) is used, where θ is the angle between the incident radar beam to the drone heading and r is the length of a single blade on the rotor. Using the STFT a cepstrogram was generated by taking the natural logarithm of the absolute energy of the STFT signal. The inverse discrete Fourier transform (IDFT) was then applied over each short time window in much the same way as (1) and finally the absolute energy is taken, giving the unique frequency dimension of pseudo units, seconds [19] Ceps[q, n] = 1 Finally, a CVD plot was generated by performing a DFT over each time vector for each discrete Doppler frequency. The discrete cadence set can be converted to cadence frequency f cad from (2) and to velocity through (3) An example of the plots generated as a result of the applied timefrequency domain transformations, is shown collectively in Fig. 2.
The range time plot after the signal pre-processing is shown in Fig. 2a, with the drone operating between range cells 34 and 36 for the hovering case. The CFAR algorithm searches for targets between cells 20 to 80 as this agrees with the size of the field and uses 10 training cells with 2 guard cells and a desired probability of false alarm value of 0.01. These parameters reliably detect the target drone in all cases and across the three radar nodes. It should be noted that range cell 10 is the direct feedback equivalent to 0 m distance and cell 105 is the tree line at the end of the field. From applying (1) to the target in Fig. 2a, the double-sided spectrogram in Fig. 2b is generated. The observed Doppler components agree well with the operating specifications of the DJI Phantom, with the absolute maximum rotational rate quoted at ∼7000 rpm under hovering conditions [20]. Applying (4) to the strongest Doppler component in the spectrum, which is at 875 Hz, yields a rotational rate of 5222 rpm. The observed spread of the Doppler components is due to RF energy being reflected at different points along the blade and they appear to be consistently spaced at 220 Hz [21]. The component at 1350 Hz is believed to be an out of spectrum intermodulation product, as it implies a rotational rate of 8050 rpm, most definitely beyond the operating conditions of the drone during the experiment; additionally, the response is on average 15 dB weaker than the other components.
The cepstrogram, after applying (2) on the spectrogram is shown in Fig. 2c. The purpose of the cepstrogram in this context is to reveal periodic details in the spectrogram, it achieves this by log scaling the spectral content within each short time window and then performing an IDFT. As a result of this, the energy contained in specific Doppler components are more evident over time, regardless of the integration time used [11].
The CVD shown in Fig. 2d, exposes the frequency content within each Doppler cell over the entire integration period. Due to the relatively large period, a 2048-point FFT was required (twice that of the previous transformations). There is a great deal of information contained within the CVD as it captures spread, shape and repetitive Doppler patterns from the spectrogram [22]. The plot in Fig. 2d confirms that 875 Hz is indeed the principle reflective component, as indicated by the spectral strength and frequency content caused by the tips of the propellers. In the other frequency transformations, it was perhaps not so obvious that this was the case. However, this representation would be better suited for radars operating at a higher frequency (X-Band or greater), offering improved Doppler resolution so that precise cyclic information within the Doppler spectrum could be resolved in detail, allowing useful information to be drawn from the cadence frequency plot. Though simply increasing the operating frequency can lead to other unwanted effects such as contention with other sources of noises, either atmospheric or naturally from the increased bandwidth throughout the RF stages, necessary for the Doppler resolution. This would be a trade-off that would have to be investigated closely in a prospective radar design.

Results
Due to the vast amount of information contained within the timefrequency transformations (Figs. 2b-d), an approach had to be sought whereby the principle features would be automatically identified and allocated the appropriate reward. This eliminates the need to determine the features of interest beforehand, which would have traditionally been achieved by developing a feature extraction algorithm, where errors would be dependent on the thresholds set by the user and more in general by the fine-tuning of parameters in the feature extraction or selection algorithm. This was achieved by implementing a CNN, more specifically the pre-trained network known as AlexNet [23]. It consists of five convolutional layers (CL) and three fully connected (FC) layers, utilising the Rectified Linear Unit (ReLU) activation function after every CL and FC [24]. AlexNet was designed to classify 1000 classes from a dataset of 1.2 million, this therefore had to be reconfigured by severing the last three layers and replacing them with a FC layer with the appropriate number of classes (i.e. 6), followed by a softmax and a classification output layer. It also required the input images to be down sampled to dimensions of 227 × 227 pixels, resulting in some loss of information, although upon inspection this was acceptable. Fig. 3 shows the layout of AlexNet, the first five layers are CL with non-essential connections dropped out and the last three are FC.
Both hovering and flying scenarios were treated as a pseudo six-class problem, where only the case between no payload vs payload (200-600 g) was of interest and all three time-frequency representations were independently injected into AlexNet, as detailed by the summary of classes in Table 1.
The time-frequency plots as obtained from Figs. 2b and c were cut in half ∼0 Hz boundary (positive and negative portion of the Doppler spectrum) and were split into time windows of 5 s each, producing 12 times the number of plots per dataset. It should be noted that the plots are not symmetrical ∼0 Hz, as the FFT is applied on the complex range time data. The CVD from Fig. 2d. was re-computed for each of the dissected plots as it does not have a time dimension which can be divided.
As this was effectively a two-class problem, there was significant class imbalance in the number of samples between no payload and payload (1:5). In preliminary tests with the CNN it was discovered that it completely ignores the characteristics from the no payload class, as it is able to achieve an accuracy of 80% by consistently classifying the input as belonging to the payload class. This was remedied by augmenting the no payload samples with Fig. 3 Modified architectural diagram of AlexNet [24]  additive white Gaussian noise (AWGN) of increasing linear variance from 0.02 up to 0.05, making the number of samples available for the two cases equal. Although this technique is usually reserved for data sets of a much a larger volume, this was sustained as Node 2 had an issue with recording which resulted in unusually noisy plots for 1/3 of the total generated samples; making this solution reasonable for this problem. Further to this issue, data augmentation was applied over all the training samples before feeding into the CNN, this consisted of Y reflections (mirror on X-axis), doubling the number of input samples. Other methods were tested such as random, X reflections, rotations and X, Y translations, however these produced unsatisfactory results, as positional information throughout these domains is effectively destroyed and this indeed had some influence in the classification abilities of the network, at later stages. The produced plots were finally stored as RGB matrices and were then downscaled to the required AlexNet input dimensions. The CNNs were trained with the following data shares: 60% training, 10% validation (applied during training process as an un-biased feedback measure) and 30% testing (completely unseen data applied after training process). This was carried out on a high-performance workstation with a Nvidia GTX 1080TI, resulting in a training time of ∼1 h per network.

Hovering data
The hovering data set consisted of 6, 30 s long recordings for each node (18), cut into 5 s windows and split at 0 Hz (216 in total). The class imbalance had to be rectified, so this resulted in the no payload class increasing in size to 180 samples through the AWGN augmentation process to equal the payload class (total 360). This was then split into the data groups, so 432 samples for training (after the Y translation), 36 samples for validation and 108 for testing. It should be noted that this is the number of samples for each transformation and all three are being applied to the CNN. The confusion matrix for the 6-class problem is shown in Fig. 4, this was repeated 5 times with the mean accuracy given for all. A total accuracy of 95.1% was achieved, 97.2% for the spectrograms, 93.5% for the cepstrograms and 94.4% for the CVDs (with percentage share of samples for each target class depicted in each cell).

Flying data
The flying data set consisted of 5, 30 s recordings cut and split in the same way resulting in 180 samples. The class imbalance was (1:4) so the no payload class increased to 144 samples (total dataset size 288). This was then split into data groups, 344 samples for training (after translation), 28 for validation and 88 for testing. The testing procedure is repeated as detailed in Section 4.1 with the confusion matrix shown in Fig. 5. An accuracy of 96.6% was achieved, 97.7% for the spectrograms, 100% for the cepstrograms and 92.0% for the CVDs. A crucial transfer learning test was also performed by applying the flying test data to the hovering CNN and vice versa, an average accuracy of 78% was achieved for both situations, indicating that neither network grossly over fitted to their associated training set.

Performance analysis
The activations of twelve different images applied to the channel which provided the best response for that image class, in the second ReLU layer in the network is visualised in Figs. 6 and 7. With the first row being activations to images with no payload and the second row likewise, however with a payload. The second layer was chosen as the first layer only demonstrated preliminary feature selection, where all the significant patterns were isolated. From the third layer and beyond, the activations became too abstract to visualise and the subtleties were not obvious. The second layer was a compromise between the two as it presented visually discernible   features for a human, though the deeper layer's exhibit features which machines are better equipped to detect and to classify upon [25]. Fig. 6 shows the activations for when the cepstrogram test images (drone flying) are applied. Here the differences are visually apparent, through the strength of the cepstral components. This layer is activating on the strong and continuous signal from the payload case, rather the weaker and broken components from the non-payload case. Fig. 7 shows the activations for when the CVD test images (drone flying) is applied to the CNN. Again, there is a clear visual difference between the two classes. For the no payload case, there is noticeable activations on the 'spikier' frequency content across the Doppler cell for when the rotor component is present. This is detected through the horizontal frequency transitions in the spectrum, which appears as 'flicks' in the channel activation plot. This is not the case for when the drone is carrying a payload, as the spectrum appears to be consistently concentrated at the lower cadence frequencies, with minimal frequency spurs occurring, hence reduced activations on these transitions.

Conclusion
Activations for both the cepstrogram and the CVD were visualised and noticeable differences between the payload cases were identified. Since the transformations effectively correspond to a binary class, there is an opportunity to fuse the results to support a final decision [16,26], making the process more robust. The CVD is the most dependent transformation as it relies strongly on the integration time, in this case it was 0.5 s, however the clarity of the plot significantly improved for longer dwell times. In a practical scenario, the dwell time is a pivotal trade-off, as a balance must be found between reliable classification and a reasonable decision time. Positional information within all the images were found to be quite important, as introducing translations etc., lead to poor classification accuracies being obtained. Forcing the CNN to tackle this scenario as a six-class problem did not hinder results too seriously, with similar accuracies being achieved with images from the same transformation applied to their own bespoke CNN, trained to only work for that image class.
Future work will focus on generating a greater database of signatures, potentially with another model of drone to see if the same network can cope with diverse images and to ensure it is not learning features from a specific type/model of drone.