DroneRF dataset: A dataset of drones for RF-based detection, classification and identification

Modern technology has pushed us into the information age, making it easier to generate and record vast quantities of new data. Datasets can help in analyzing the situation to give a better understanding, and more importantly, decision making. Consequently, datasets, and uses to which they can be put, have become increasingly valuable commodities. This article describes the DroneRF dataset: a radio frequency (RF) based dataset of drones functioning in different modes, including off, on and connected, hovering, flying, and video recording. The dataset contains recordings of RF activities, composed of 227 recorded segments collected from 3 different drones, as well as recordings of background RF activities with no drones. The data has been collected by RF receivers that intercepts the drone's communications with the flight control module. The receivers are connected to two laptops, via PCIe cables, that runs a program responsible for fetching, processing and storing the sensed RF data in a database. An example of how this dataset can be interpreted and used can be found in the related research article “RF-based drone detection and identification using deep learning approaches: an initiative towards a large open source drone database” (Al-Sa'd et al., 2019).


Data
In this article, we present an RF based dataset of drones functioning in different modes. The dataset consists of recorded segments of RF background activities with no drones, and segments of drones operating in different modes such as: off, on and connected, hovering, flying, and video recording (see Figs. 4e6). The records are 10.25 seconds of RF background activities and approximately 5.25 seconds of RF drone communications for each flight mode. This has produced a drone RF database with over 40 GB of data encompassing various RF signatures. There are in total 227 segments (see Table 1), each segment consists of two equally sized parts with each part containing 1 million samples, making in total 454 record files. The samples in the segments represent the amplitude of the acquired row RF signals in the time domain. The segments in the database [2] are stored as comma-separated values (csv) files, this makes the drone RF database easy to load and interpret on any preferred software. Metadata for each segment in the database is included within its filename. It contains the segment Binary Unique Identifier (BUI) to differentiate between the drone's mode (see Fig. 7), followed by a character to determine if it is the first or second half of the segment, and its segment number. For Value of the data The droneRF dataset can be used to develop new techniques for drones' detection and identification, or as a critical building block in a large-scale anti-drone system that includes other functions such as drones' intrusion detection, tracking, jamming, and activity logging. DroneRF helps in understanding the signatures of different drones operating in different modes (see section 1.6 for details about the drones' flight modes) based on their radio frequency signal characteristics. DroneRF can inspire new methods for detecting the drones' existence, and possibly identifying the drones' make, type, etc.     instance, the third segment of the second half of the RF spectrum with BUI ¼ 11010, Phantom drone with flight mode number 3, will have the following filename: "11010H3.csv".

Equipment
In our experiment, flight controllers and mobile phones were used to send and receive the RF commands, to and from the drones under analysis to alter their flight mode. Controlling the drones by a  mobile phone requires mobile applications that are specifically developed for each drone. "FreeFlight Pro", "AR.FreeFlight", and "DJI Go" are free mobile applications developed to control the Bebop, AR, and Phantom drones, respectively. RF commands intercepted by two RF receivers, described in the next section, each receiver communicates with a laptop (CPU Intel Core i5-520M 2.4 GHz, 4 GB RAM, running Windows 7 64-bit) through PCIe cable (see Fig. 1), doing data fetching, processing and storing are performed by programs we designed in LabVIEW Communications System Design Suite [3].

Sensing and capturing RF signals
RF sensing and capturing process required RF receivers, to intercept the drone communication with its flight controller, connected to laptops that are responsible for fetching, processing and storing the recorded RF signals in the database. The drones that have been used in the experiment operates differently in terms of connectivity, (e.g. Bebop and Phantom use WiFi 2.4 GHz and 5 GHz with different bandwidths). For more details, one can read the full specification in [5e7]. In this work, it is assumed that all drones use WiFi operated at 2.4 GHz. Hence, there are some minimal assumptions. However, the drone operating frequency can be detected using various methods such as passive frequency scanning.
First, raw RF samples are acquired using two National Instruments USRP-2943 (NI-USRP) software defined radio reconfigurable devices, shown in Fig. 2: NI USRP-2943R RF receiver. Table 2 lists the NI- Fig. 7. Experiments to record drones RF signatures organized in a tree manner consisting of three levels. The horizontal dashed red lines define the levels. BUI is a Binary Unique Identifier for each component to be used in labelling [1].  Fig. 4). After that, captured RF data is transferred from the NI-USRP receivers to two standard laptops via Peripheral Component Interconnect Express (PCIe) interface kits. Finally, data fetching, processing and storing are performed by programs we designed in LabVIEW Communications System Design Suite. The programs are designed in a standard LabVIEW manner using front panel and block diagram environments. Fig. 3b shows the block diagram that demonstrates all the functions and operations used to fetch process and store the RF data. As demonstrated in Fig. 3a, by using the front panel, one can alter the captured band; lower half or upper half of the RF spectrum, carrier frequency, IQ rate, number of samples per segment, gain, and activate a specific channel of the NI-USRP receiver. In addition, one can select different flight modes and experiments to build a comprehensive database.

Experimental setup
The setup is shown in Fig. 1. To conduct any experiment using this setup, one must perform the following tasks carefully and sequentially. If you are recording RF background activities, perform tasks 4e7: 1. Turn on the drone under analysis and connect to it using a mobile phone or a flight controller. 2. In case the utility of a mobile phone as a controller, start the mobile application to control the drone and to change its flight mode. 3. Check the drone connectivity and operation by performing simple takeoff, hovering, and landing tests. 4. Turn on the RF receivers to intercept all RF activities and to transfer those to the laptops via the PCIe connectors. 5. Open the LabVIEW programs, installed on the laptops, and select appropriate parameters depending on your experiment and requirements. 6. Start the LabVIEW programs to fetch, process and store RF data segments. 7. Stop the LabVIEW programs when you are done with the experiment. 8. For a different flight mode, go back to step 6, and for different drones go back to step 1.

Experiments
The RF drone database is populated with the required signatures by conducting experiments organized in a tree manner with three levels as demonstrated in Fig. 7. The first level consists of the following branches to train and assess the drone detection system: Drones are off; RF background activities are recorded (see Fig. 4a). Drones are on; drones RF activities are recorded (see Fig. 4b).
The second level includes experiments that are conducted on the three drones under analysis: Bebop, AR, and the Phantom drones, to train and assess the drone identification system. Finally, the third level expands its predecessor by explicitly controlling the flight mode of each drone under analysis to assess the identification system ability in determining the flight mode of intruding drones.
On and connected to the controller (Mode 1, see Fig. 5a). Hovering automatically with no physical intervention nor control commands from the controller. Hovering altitude is determined by the drone manufacturer (approximately 1 m) (Mode 2, see Fig. 5b).
Flying without video recording. Note that the drone must not hit any obstacles in this experiment to avoid warning signals (Mode 3, see Fig. 5c). Flying with video recording (Mode 4, see Fig. 5d).
Note that the drone must not hit any obstacles in this experiment to avoid warning signals. The former experiments are conducted by following the steps summarized in the previous section.

RF database labeling
The BUI is used to label the RF database entries according to the conducted experiment, drone type, and its specific flight mode, see Fig. 7: Experiments to record drones RF signatures organized in a tree manner consisting of three levels. The horizontal dashed red lines define the levels. BUI is a Binary Unique Identifier for each component to be used in labelling. The BUI is comprised of two binary numbers concatenated such that: BUI ¼ [msBUI, lsBUI], msBUI is the most significant part of the BUI representing the experiment and drone type, levels one and two, while lsBUI is the least significant part of the BUI representing the drone flight mode, third level. The BUI length L is determined using the total number of experiments E, the total number of drones D, and the total number of flight modes F, as follows: L ¼ log 2 ðEÞ þ log 2 ðDÞ þ log 2 ðFÞ where in this work, E ¼ 2; D ¼ 3 and F ¼ 4; therefore, L ¼ 5. Extending the developed database using other experiments, drones, or flight modes can be easily done by increasing E; Dor F, respectively.

Visualizing the data
To visualize the data, various software can be used. As mentioned before, each flight mode recording is composed of segments, each segment is split into two parts. In order to plot a segment, the two parts should be loaded into the software work-space together (E.g. 11000L_3, 11000H_3). After loading the data, amplitude normalization can be done for a better visualization, and that's by dividing all the samples in the segment by its maximum absolute value to end up with values between 1 and -1. If needed for data analysis, frequency samples can also be calculated using Discrete Fourier transform (DFT) of each recorded segment coming from both receivers [1]. Figs. 4e6 were generated by shifting the samples of one part e.g. 11000L_3 up by adding 1, while shifting the other part down by subtracting 1.