Using deep learning in an embedded system for real-time target detection based on images from an unmanned aerial vehicle: vehicle detection as a case study

ABSTRACT For a majority of remote sensing applications of unmanned aerial vehicles (UAVs), the data need to be downloaded to ground devices for processing, but this procedure cannot satisfy the demands of real-time target detection. Our objective in this study is to develop a real-time system based on an embedded technology for image acquisition, target detection, the transmission and display of the results, and user interaction while providing support for the interactions between multiple UAVs and users. This work is divided into three parts: (1) We design the technical procedure and the framework for the implementation of a real-time target detection system according to application requirements. (2) We develop an efficient and reliable data transmission module to realize real-time cross-platform communication between airborne embedded devices and ground-side servers. (3) We optimize the YOLOv4 algorithm by using the K-Means algorithm and TensorRT inference to improve the accuracy and speed of the NVIDIA Jetson TX2. In experiments involving static detection, it had an overall confidence of 89.6% and a rate of missed detection of 3.8%; in experiments involving dynamic detection, it had an overall confidence and a rate of missed detection of 88.2% and 4.6%, respectively.


Introduction
Because target detection based on deep learning has a strong capability for data processing and yields a high accuracy, it has emerged as an important area in the relevant research (Zhang, Chen, and Cai 2021;Boudjit and Ramzan 2022;Yang et al. 2022).As a new carrier platform and tool for data acquisition, the unmanned aerial vehicle (UAV) has been widely used in the field of remote sensing (Huang et al., 2022b;Jiang, Jiang, and Wang 2022).Target detection based on the UAV is a popular subject of research on detection, and has important applicative value (Gaszczak, Breckon, and Han 2011).It has been applied to target tracking (Zhang, Li, and Qi 2018), hazard/disaster surveys (Munawar et al. 2019), monitoring traffic violations (Qu, Jiang, and Guo 2016), and epidemic prevention and control (Roelofs et al. 2021).With the development of embedded technologies and equipment, it has become possible to detect targets based on UAVs in quasi-real time/real time (Kyrkou et al. 2018;Amato et al. 2019;Ringwald et al. 2019;Boudjit and Ramzan 2022;Masouleh and Hosseini 2022).This can provide real-time results of target detection.
A review of the literature on the requirements of real-time target detection by using UAVs shows that there are problems in current research that are reflected in three aspects: (1) Most studies (Masouleh and Hosseini 2019;Bai et al. 2021;Cheng et al. 2022;Gupta et al. 2022;Yang et al. 2022) have used the UAV only as a platform for data acquisition.Data in the aerial images acquired by UAVs for target detection also need to be processed on the ground, and this makes it difficult to satisfy the requirements of real-time detection.(2) To overcome the problem in (1), researchers have explored real-time target detection based on UAVs (Kyrkou et al. 2018;Amato et al. 2019;Ringwald et al. 2019;Boudjit and Ramzan 2022;Masouleh and Hosseini 2022) and developed embedded devices (such as the NVIDIA Jetson TX2; hereinafter referred to as TX2) for the realtime processing of the data.To adapt to the performance and architecture of the embedded platform, most studies have used lightweight embedded versions of the deep learning model, such as MobileNet (Howard et al. 2019), Tiny YOLO, and DroNet (Kyrkou et al. 2018).Experiments have shown that such models have a high processing speed but poor accuracy of detection.(3) Current applications of real-time target detection based on UAVs lack unified system management that can provide effective support for multiple UAVs, display the results of real-time detection, and allow for real-time target detection and control in multiple modes.Such application management should start from the perspective of the Brower/Server (B/S) architecture, take into account the scalability and high capability of interaction of the system, and systematically integrate the corresponding software modules that are developed.At the same time, it requires an efficient communication transmission module to support its functions.
Motivated by the above, we design an integrated system of image acquisition, target detection, and the transmission, display, and management of the results in real time based on data acquired from equipment embedded into UAVs.We use the quasi-real-time detection of a vehicle as an example to illustrate the entire application process.That is, we design and implement a UAV-based realtime target detection system (UAV-RTDS).The contributions of this study are as follows.
(1) In order to ensure accurate target detection, we use the YOLOv4 algorithm (Bochkovskiy, Wang, and Liao 2020), rather than a lightweight version of it, in the embedded device TX2.
To solve the problems whereby the speed of detection and accuracy of the YOLOv4 algorithm on embedded devices are poor, we use the K-Means algorithm to cluster the anchors of the YOLOv4 algorithm to improve its accuracy of target detection.We then apply TensorRT inference to the optimized algorithm to improve its speed of detection.Finally, we use vehicles as the target of detection to experimentally verify the complete system.(2) We collect images by using UAVs through serial port communication, and leverage the relevant characteristics of the local area network (LAN) to implement high-performance asynchronous I/O data communication under the framework of Netty.This is used to develop a high-performance communication module between the embedded devices and the ground.(3) An automatic system is developed for the real-time acquisition, processing, transmission, management, and control of images acquired by UAVs.The system is applicable to a variety of scenarios involving UAV images, and is adequately scalable and interactive.It supports the simultaneous registration and use of multiple UAVs.Users can access, browse, and control the system by remotely logging into it.
The usefulness and stability of the system were verified through field tests and experiments.However, we found that the system is still in its infancy, and can be regarded only as a quasi-real-time system.There is room to further optimize it in terms real-time processing and remote data transmission to satisfy the demands of real-time processing.Although the research here is mainly based on prevalent algorithms and related technologies, we also make major contributions to the relevant areas: (1) We show how to implement the key algorithms and modules mentioned above, e.g. the improved YOLOv4 target detection algorithm and the data transmission module, and how to combine, integrate, and optimize different parts to build an interactive, shared, and quasi-real-time target detection system based on images acquired from UAVs.(2) We provide a complete development paradigm for similar application systems (such as other applications of target detection), where this has guiding significance for research in the area.These contributions can help promote developments in target detection, especially in real-time target detection based on UAVs.
The remainder of this paper is organized as follows: Section 2 gives a brief introduction to related work on target detection algorithms based on deep learning, the combination of UAVs and deep learning-based target detection, and real-time processing systems for target detection based on UAVs combined with the development of the requisite embedded equipment.Section 3 introduces the research scheme, framework, and principle of operation of our proposed UAV-RTDS.We detail the design and implementation of the system in Section 4, including the optimization of the target detection module (TDM), the design and implementation of the real-time file transmission module, and the UAV management software.Section 5 describes experiments to test the performance of the proposed system and discusses the results.Finally, the conclusions of this study and directions for future research in the area are provided in Section 6.

Related work
Deep learning has natural advantages in data processing for the operations of remote sensing, such as the removal of clouds and noise, data fusion, land classification, and target detection (Han et al. 2021;Zhang, Zhou and Luo 2021).The advances in deep learning have ushered in an era of rapid development in target detection.Target detection in the context of deep learning can be divided into two categories (Sharma and Mir 2020): 'two stage detection' and 'one stage detection.'The two-stage algorithm first selects candidate regions in the given image, and then classifies the target and refines its location in them.Girshick et al. (2014) applied deep learning to target detection, and proposed the regions with the convolutional neural network (R-CNN) algorithm.This laid the foundation for subsequent target detection algorithms based on the CNN.He et al. (2014) optimized the architecture of the CNN by introducing the spatial pyramid pooling layer, and proposed the SPP-Net detection algorithm that is significantly faster than R-CNN.In 2015, Girshick proposed the Fast R-CNN model that can discriminate among all possible candidate boxes in the extracted feature map, and is significantly faster than R-CNN in terms of training and detection.In the same year, the Faster R-CNN algorithm was proposed by Ren et al. (2015).The one-stage algorithm is based on global regression-based classification to directly generate the position and category of the target object.Compared with the two-stage algorithm, it can better meet the requirements of real-time detection.In 2016, Redmon et al. proposed  Due to its advantages of convenience, portability, and low cost, the UAV platform is often used for the real-time acquisition of small-scale image-related data.However, the traditional research on image processing based on the UAV platform has mainly involved transmitting the acquired images to a local high-performance computing platform for processing (Masouleh and Hosseini 2019;Ammar et al. 2021;Cheng et al. 2022;Gupta et al. 2022;Zakria et al. 2022).A review of the literature shows that prevalent research on the combination of the UAV and deep learning-based target detection algorithms has focused on the following: (1) Many studies have used deep learningbased target detection algorithms to process the aerial images acquired by UAVs to improve the speed and accuracy of detection.(2) Some research in the area is based on a combination of UAVs and deep learning-based target detection algorithms to implement or integrate system functions for specific applications in specific scenarios, or to enhance and improve algorithms such that they are suitable for such application systems.
Research falling into category (1) above has involved studies on the two-stage and single-stage algorithms.For example, Sommer, Schuchert, and Beyerer (2019) systematically investigated the use of Fast R-CNN and Faster R-CNN on images acquired by UAVs, and they delivered the best results on commonly used benchmark datasets.To overcome the shortcomings of the original approaches in case of a small number of instances, they proposed their own networks that outperformed the traditional methods in terms of identifying vehicles in UAV images.Masouleh and Hosseini (2019) used deep learning for vehicle detection based on thermal UAV images.They improved the performance of a deep learning-based model by using specifications of the Gauss-Bernoulli restricted Boltzmann machine (GB-RBM) to segment ground vehicles in UAV-based thermal infrared images.Bai et al. (2021) proposed an improved SSD model based on deep feature fusion.Through the effective fusion of multi-level and multi-scale information on features of the target, the feature layer used for prediction can make full use of them to improve the ability of the algorithm to detect targets and small objects at different scales.Ren et al. (2022) proposed an improved Mask-RCNN algorithm to improve the speed of detection of objects in thermal infrared images acquired from UAVs while reducing the storage space required.Bouguettaya et al. (2022) systematically reviewed the development of deep learning-based methods of vehicle detection and datasets of images acquired from UAVs, and identified the main problems in this line of research.The YOLO series of algorithms, which are single-stage algorithms, are the mainstream in target detection based on deep learning owing to their excellent speed and accuracy of detection.Ammar et al. (2021) compared the effects of YOLOv3 and YOLOv4 on the task of vehicle detection.The results showed that they performed differently on different datasets but were superior to Faster R-CNN.Liu et al. (2020) developed a method for identifying small objects in images acquired from UAVs based on YOLOv3; i.e.UAV-YOLO.This method improved YOLOv3 by training the model and optimizing the structure of the backbone to improve its capability for detecting small objects.Gupta et al. (2022) used Tiny YOLOv3 to detect military vehicles, and collected and labeled a new dataset.Tiny YOLOv3 delivered good detection-related performance on the new dataset.Zakria et al. (2022) improved the accuracy and robustness of the YOLOv4 network in terms of target detection in remote sensing images.YOLOv4 delivered better performance in detecting targets in optical remote sensing images than traditional methods, with a mean average precision of 75.15%.The authors also proposed two improved schemes for allocating anchor frames for YOLOv4 that improved its detection-related performance on some categories of targets.Yang et al. (2022) proposed the BCo-YOLOv5 network model to identify and detect fruits in images of orchards.They introduced the bidirectional cross-attention mechanism between the backbone of the YOLOv5s model and the neck network to enhance the ability to extract the features of local correlations and directional features of the image to improve its accuracy of detection.
Of the research falling into category (2) above, some studies have focused on specific application systems.Ibrahim et al. (2010) proposed a system to track and detect moving targets based on the UAV that formed a complete map of the area of detection through image mosaicking.Sun et al. (2016) developed an integrated target detection and positioning system based on cameras, integrated it into a fully automatic fixed-wing UAV, and applied it to wilderness search and rescue work.Similarly, Hinas, Roberts, and Gonzalez (2017) developed a system that uses a target detection algorithm and a multi-rotor UAV to find and inspect targets on the ground.In research on improved algorithms for system applications, Fang et al. (2021) transformed the problem of detecting a target in thermal infrared images acquired from small UAVs into the problem of residual image prediction to identify targets in dim images with serious clutter.Li et al. (2022) proposed the U2U-D&T algorithm, which can reliably detect and track the target from images acquired by a UAV.Masouleh and Hosseini (2022) proposed SA-Net for detecting abnormal targets in a small number images from a UAV.Cheng et al. (2022) used the YOLOv3 algorithm along with the mean shift and the Kalman filter algorithm to track mobile vehicles.
The requirement of real-time target detection has been rarely considered in the above research.Embedded development technology is a special computer system that takes the application as its center and the computer technology as its basis to flexibly tailor software and hardware modules according to the needs of the user.It has the characteristics of software code of a small size, high automation, and fast response (Ang and Seng 2021;Liu et al. 2021).Using embedded equipment on a UAV can not only help realize real-time processing, but can also help exploit the unique advantages of flexibility, speed, and mobility of the UAV (Mademlis et al. 2022).With the development of the graphics processing unit (GPU), the computational capability of embedded devices has significantly improved such that they can play an important role in deep learning (Pereira and Pereira 2015;Peng, Lin, and Dai 2016;Basso et al. 2019).
The research on real-time target detection by using UAVs and embedded computing equipment remains in its infancy.Researchers have examined different methods of implementation to this end.Some have proposed downloading/transferring images, and for the target to be detected in them on the ground following quick processing to obtain a quasi-real-time result of detection.For example, Boudjit and Ramzan (2022) proposed a method to detect objects based on the YOLOv2 network by sending images from UAVs to a PC.The received image is quickly processed by the YOLOv2 algorithm to identify the target.The response speed of the algorithm for real-time tracking needs to be faster than that of traditional methods such that it can track the target without losing sight of it.In earlier work in the area, Sun et al. (2016) implemented an integrated camera-based target detection and positioning system in a similar way.In contrast to the computing equipment used in previous studies, Huang et al. (2022a) proposed a system to identify citrus fruits in images acquired by a UAV with Artificial Intelligence equipment on it.The system used a small UAV equipped with a camera to take full-sized photos of citrus trees.The YOLOv5 model for target detection was improved and transplanted into an edge computing device, NVIDIA Jetson Nano, to precisely identify the target in data collected by a UAV.The rapid recognition and processing of UAV images in this study is also realized by using the Jetson Nano on the ground through a wireless network.Note that such a method in essence is still equivalent to traditional ground processing.With the development of embedded development technologies, the means and equipment for processing data acquired by a UAV on the ground can be further miniaturized and integrated.For instance, Kyrkou et al. (2018) designed an efficient CNN, DroNet, that is suitable for an embedded UAV platform based on YOLO as the target detector.This network can recognize only a single target accurately and in real time.With improvements in the computational performance of embedded devices, more studies have directly used them on the UAV for the real-time processing of complex deep learning models.For example, the short-range target detection system developed by Hinas, Roberts, and Gonzalez (2017), based on a combination of a target detection algorithm and a multi-rotor UAV, uses a UAV embedded with Raspberry Pi for data processing and real-time detection.The UAV can hover above the target for a few seconds to detect it.Ringwald et al. (2019) proposed a model of vehicle detection, UAV-Net, that is suitable for deployment on an embedded platform.It can achieve real-time detection on the TX2 platform.Amato et al. (2019) used YOLOv3 to detect and count the number of vehicles in a given area, deployed it on the TX2 platform, and achieved good results.Su et al. (2021) proposed a remote sensing-based method of detecting vehicles in images based on the YOLOv5 model, and deployed it in a TX2-embedded device that could be deployed on a satellite platform.The results of experiments showed that this algorithm can detect vehicles in a wide-range remote sensing image with a resolution of 12,000 × 12,000 in an embedded device in only about 80 s.Moreover, Masouleh and Hosseini (2022) proposed a method of real-time target detection based on images acquired from a UAV that can be used in urban management and precision farming.
To sum up, target detection algorithms based on deep learning are now the mainstream in the area.Research on real-time target detection based on embedded development technology in UAVs and deep learning can give full play to the advantages of flexibility, quickness, and convenience of UAVs.This has a wide range of prospects for application, e.g. in security and monitoring, forest fire prevention, monitoring environmental pollution, and urban intelligent transportation.Compared with the relevant research in the area, ours is different in terms of the model of detection used, realtime data transmission, and system integration and optimization.We explore the key technologies involved in a universal and interactive quasi real-time target detection system.

Analysis of system requirements
The UAV-RTDS proposed in this study meets the following requirements.
(1) The user can manually control image acquisition by the UAV from the ground and obtain the temporal positional information of each image.
(2) The UAV-RTDS can realize the transmission and synchronization of the results of identification between the ground and the equipment into the UAV.Because the equipment on the ground and that on the UAV use the Windows and the Linux operating systems, respectively, the module for data transmission and communication satisfies cross-platform requirements.
(3) The system can store and manage the results of detection at the ground end, display them, and can concurrently operate the corresponding interface on the front page on the display on the ground.(4) The system can connect the processes of image acquisition, target detection, and the transmission, storage, and display of the results of detection to realize the entire process of UAVbased real-time target detection.

System hardware
A certain amount of hardware support is required to meet the above requirements.All the hardware and its connections are shown in Figure 1.The system hardware can be divided into two parts: the onboard and the ground parts.The onboard equipment is composed of a DJI M600 Pro UAV, an NVIDIA Jetson TX2 embedded development board, a charge-coupled device (CCD) camera, a global positioning system (GPS) module, and a wireless Wi-Fi module on the TX2.The TX2, CCD camera, and GPS module are fixed on the simple pan of the UAV.The TX2 draws power from the UAV through a voltage-stabilizing module, while the CCD camera and GPS module are powered by the TX2.The ground part consists of a USB-powered high-power wireless router and a laptop server.The TX2 from the onboard equipment and the server on the ground communicate through a LAN composed of routers.

Overall framework and operational principle
The overall framework and operational principle of the UAV-RTDS are shown in Figure 2. From the perspective of the software, the system can be divided into three parts, viz., (1) the UAV client deployed on the TX2 development board on the UAV, (2) the ground server deployed on a laptop, and (3) the front-end system primarily accessed by the user through the browser.The framework contains a data communication module used to connect the UAV client to the ground server and a detection module to perform the task of target detection.
The overall operational principle of the system is as follows: (1) Pre-processing.The data are collected and the target in them is labeled to obtain the training dataset.The detection model is then  trained by using the data on a high-performance computing platform on the ground to obtain the corresponding file of trained weights.This file is uploaded to the detection module on the client of the UAV to identify the target.This process is shown in the red box in Figure 2. (2) The user accesses the Web through the front-end system and sends an active photographing command to the ground server.The ground server in turn sends the command to the UAV client through the data transmission module (DTM), as shown in the green box in Figure 2. (3) The client collects images and obtains the relevant GPS information.(4) The captured images and GPS data are directly saved to the embedded development board through serial communication, and real-time target detection is then carried out through the detection module (as shown in the blue box in Figure 2).( 5) The results of detection are directly transmitted to the ground server.Users can access, view, and manage them online.This constitutes the workflow of the complete real-time processing system for target detection based on images acquired from UAVs.

Design and implementation of the UAV-RTDS
To implement manual and automatic photography, target detection, the transmission and display of the results, user control, and real-time interaction among UAVs and multiple users of the UAV-RTDS, we divide the implementation of the system design into three parts according to the overall framework as shown in Figure 2: (1) a TDM, as shown in the blue box in Figure 2, (2) a DTM as shown in the green box in Figure 2, and (3) a UAV management software system (MSS).The TDM is responsible for target detection in the collected images and returning the results.The DTM is responsible for ensuring smooth communication between the UAV and the ground server to complete image transmission.The UAV MSS, which includes a UAV client, a ground server, and a front end, is responsible for calling each template to complete the entire detection process to realize interaction between the UAV and the user, and for displaying the results.

Optimizing the TDM
4.1.1.Limitations on the performance of the YOLOv4 algorithm on an embedded platform A TDM based on a CNN can be very accurate but still requires a high-performance GPU to run in real time.Owing to its unique network structure, the YOLO framework proposed by Redmon et al. (2016) needs only one forward propagation of the target image to perform its prediction, which tremendously improves the speed of computation while ensuring the accuracy of the result.This renders it suitable for real-time application scenarios.Nevertheless, the academic community has continued to promote iterations of the YOLO algorithm, from YOLOv1, YOLOv2, and YOLOv3, to the YOLOv4 algorithm and the recently developed YOLOv7.Hence, the speed and accuracy of detection of YOLO are continually being optimized.For example, YOLOv4 is a target detection algorithm proposed by Bochkovskiy, Wang, and Liao (2020) based on YOLOv3 that focuses on the accurate and quick detection of small targets.YOLOv4 is an excellent target detection algorithm.As it was the latest available version at the time in which this study was conducted, we have chosen to verify the entirety of the proposed system based on it.
We first tested YOLOv4 on the embedded platform TX2.We chose two environments, an ordinary PC and the embedded device TX2, for a comparative analysis.The results are shown in Table 1.
We chose 20 images of three sizes, 512 × 512, 1024 × 1024, and 2048 × 2048 pixels, to test the performance of the original algorithm.The results are shown in Table 2.
The results in Table 2 show that the performance of the YOLOv4 algorithm on the TX2 was poor due to limitations in the performance of embedded development board.Compared with the ordinary PC, it cost more time to run the YOLOv4 algorithm on the TX2.For instance, the same data required processing durations that took 147, 172, and 175 times longer to run on the TX2 platform than on the ordinary PC for image with sizes of 512 × 512, 1024 × 1024, and 2048 × 2048 pixels, respectively.
Furthermore, the clustering of the YOLOv4 algorithm was based on 20 categories of the Pascal visual object class (VOC) dataset.Because there was a large gap between the lengths and widths of the categories used, there was a large gap between the centers of clustering.The images used from the dataset while training the method for this study involved only two categoriesvehicles and the backgroundsuch that fixed frames could not be satisfactorily calculated.This inevitably led to the target not being within the specified boundary, thus affecting the accuracy of detection.To ensure the accuracy and real-time nature of UAV-based target detection, we optimize information on the target through YOLOv4 based on the K-means algorithm and TensorRT.
4.1.2.Improving the accuracy of YOLOv4 by using the K-Means algorithm Our work relies on the K-Means algorithm (MacQueen 1967).We cluster the bounding box of YOLOv4 to search for more accurate anchor boxes, and to improve their rate of detection and the intersection over union (IOU).The processing flow of the YOLOv4 algorithm, optimized based on the K-Means algorithm (hereinafter referred to as the K-YOLOv4 algorithm), is shown in Figure 3, in which the dotted box is the optimization flow added in this study.
A higher degree of coincidence between the prediction box (box) and the anchor boxes (anchor) in the YOLOv4 algorithm entails a higher accuracy of prediction.The network in the original algorithm exhibits translational invariance, and the positions of the anchor boxes are fixed.Moreover, its use of the Euclidean distance increases the rate of error of the prediction box.Therefore, we need to redefine the anchor boxes in sthe context of a predicted frame distance (d (box, anchor)) as follows: According to the above formula, a higher IOU results in a smaller distance and a higher probability that the results of clustering belong to the same class.This implies a higher coincidence between the prediction box (box) and the anchor boxes (anchor).
Based on the format of the annotation file of YOLOv4 and the K-Means algorithm, the steps used to implement the K-YOLOv4 algorithm are as follows.
(1) All coordinates of the bounding box in the annotation file are extracted.Due to the invariance of translation of the neural network, only the height and the width of the extracted coordinates are retained, and are converted into the height and the width of the bounding box.(2) Randomly select k values from all the extracted bounding boxes as the initial values of the fixed box.
(3) Use a newly defined distance formula to calculate the IOUs of the bounding box and the fixed box, and then calculate the distance.Based on the above process of the K-Means algorithm, we cluster the bounding boxes of the dataset of YOLOv4 to obtain the anchor box and then replace the original anchor box with the clustered anchor box.
The loss function used in this study is identical to that of the YOLOv4 model.It considers the accuracy of classification and target localization as well as confidence.The loss function is as follows: E 1 , E 2 , E 3 refer to the classification loss, the localization loss (error between the predicted bounding box and the ground truth), and the confidence loss, respectively.Detailed information on E 1 , E 2 , E 3 has been provided in the literature (Bochkovskiy, Wang, and Liao 2020).

Accelerating the K-YOLOv4 algorithm based on TensorRT
We use the high-performance inference optimizer TensorRT5.1.6,launched by NVIDIA on the Jetson TX2 platform, to accelerate the K-YOLOv4 algorithm.The entire process of reasoningbased acceleration can be categorized into two parts, viz., compilation and use.In the compilation stage, we first need to convert the model according to the structure of the optimized YOLOv4 network and then enter the converted model into TensorRT for network consolidation and compression.This is used to reduce the running time of the network to generate a new reasoning engine and use it for detection.We imported the model through a Python application program interface (API), and the network structure was defined by using the K-YOLOv4 algorithm.TensorRT simplifies the network, generates a new onnx model, and then loads the trained weight file to be imported.Figure 4 shows a flowchart of the TensorRT optimization algorithm.
According to Figure 4, the file of trained weights of K-YOLOv4 is loaded through the Dar-knetParser parser and the onnx nodes are generated by parsing the network layers, such as revolutionary, route, and yolo, in a layer-by-layer manner.Further, an onnx graph is generated by onnx to generate the onnx model.Following this, the onnx model is constructed by using a Builder, a network object is created through the Builder to load the model, and a Parser is created to parse it.Finally, the Builder creates an engine file according to the network and then loads it to identify the target.In the next section, we verify the effect of optimization through experiments.

Design of the DTM
To realize the DTM between the UAV and the ground terminal, and ensure the reliability of file transmission, a Ji router (a USB-powered high-power router for field use) based on LAN WiFi was selected as the communication equipment for this study.Concomitantly, we developed a DTM suitable for the equipment and implemented it by using the high-performance and reliable transmission control protocol (TCP) and the Netty framework.This module is responsible for communication and data transmission between different hosts and platforms.Furthermore, to maintain the integrity of the data files, it was necessary to establish a stable and reliable transmission channel.The DTM relied on the TCP to establish a reliable connection in the transmission layer.While sending data, the DTM ensured the integrity of transmission through a series of methods, such as confirmation, congestion control, timeout retransmission, three handshake connections, and four-wave disconnect management.These methods ensure that the data accurately reach the receiver.We first designed the transmission format, as shown in Figure 5, based on the TCP.
Based on the above data format, the mode of interaction between the client and the server could be described by the sequence diagram shown in Figure 6.In Figure 5, 'instruction 0' is the request for data transmission.The client sends the data transmission request to the server and informs it of the file name and size.After receiving instruction 0, the server feeds back instruction 1 to the client and asks it to transfer data from a certain location in the file.After receiving instruction 1, the client starts data transmission and provides information to the server regarding the file, such as its name, starting position, and end position, until file transmission has been completed.Following this, instruction 2 is sent to the server to confirm the completion of the file transfer.After receiving instruction 2, the server saves the file and sends instruction 1, sets the file status to 'Complete,' and both sides end the transfer.
In file transfer based on the client-server architecture, the file to be transferred via the DTM is generally scanned when the client application starts, and the transfer does not continue after its completion.However, in application scenarios, the UAV continues to generate image files.Hence, it is necessary to design a client application that can continuously scan the newly generated data in real time once the application has started so that the images generated by the UAV can be transmitted in real time.Therefore, we introduce a timing mechanism to the client application and design a timing scanner.However, if the timing mechanism is directly introduced to the client transmission module, this tightly coupled design causes file transmission to stop and for the system to wait for the timer to scan due to the uncertainty of the network and the duration of file transmission.We design a producer-consumer client model to solve this problem, as shown in Figure 7.It separates the timing scanning program from the DTM.The timing scanning program is  responsible for scanning files as a producer and the DTM is responsible for their transmission as a consumer.
According to Figure 7, the file scanning program as a producer and the DTM as a consumer are mutually independent, where this avoids the problem of disorder in file transfer.According to the file management tool provided by Java, the file scanning module of the subsystem is composed of a file scanner and a timer.Once the directory files have been scanned, they are placed in a blocking queue using the file queue manager.To avoid the repeated transmission of files, the file queue manager first determines whether the file has been transmitted.If so, the file is not entered into the blocking queue.Due to limitations of space, the specific implementation of the DTM using Java and the Netty framework is not discussed here.

UAV MSS
The overall architecture and composition of the modules of the UAV MSS are shown in Figure 8.It can divided into three parts: (1) The UAV client.It is deployed on the TX2 development board on the UAV, and is primarily responsible for managing the corresponding client in the DTM and ensuring the establishment of the connection by constantly sending and receiving heartbeats.When a heartbeat is received, the manual and automatic instructions for photography sent by the UAV management server are obtained, and the detection module is called to generate the results.(2) The UAV management server.It is deployed on a laptop on the ground and it is responsible for detecting the transmission operations of the client as well as the reception and management of the captured images.It also interacts with the front end and transmits front-end instructions to the client in the form of a data flow.(3) The frontend system.It is mainly accessed by users through a browser, and can display the image-related information in real time and manually control the camera to take photos in the front-end interface.

UAV management client
According to the system design, the UAV client module performs the acquisition, detection, and transmission of the image-related data acquired by the UAV.The UAV client contains three modules; i.e. the TDM, the DTM, and the UAV image acquisition module.The design and implementation of the first two modules have been given above, and the image acquisition module of the UAV is described below.
There are two ways for a UAV to fly: according to a preset route, or a manually controlled flight.Then, data acquisition by the UAV corresponds to cruise-based automatic acquisition and fixedpoint hovering-based acquisition.We thus designed two methods of acquiring image-related data by using the UAV: automatic photography and manual photography.Flowcharts of these two modes are displayed in Figure 9. Automatic photography involves setting the photographing interval so that the camera can automatically obtain images.Manual photography requires designing a 'Photography' button in the system such that the ground operator can obtain an image by clicking it.GPS information is collected during data acquisition.
The automatic photography mode is based on the UAV's route planning function.The program for image acquisition by the camera uses a control time interval; i.e. it calculates the interval of photography based on the distance to the target and the speed of the UAV, and sets it in advance.It simultaneously obtains positional information.Once the image has been acquired, the detection program based on the improved YOLOv4 algorithm performs polling detection.The pseudocode corresponding to this part is shown in Table 3.
In the manual photography mode, the UAV waits for the user to send an image acquisition instruction during flight, and then collects the data after receiving the UAV photography command.The code for data acquisition can be run after removing cyclic photography and stopping the waiting code based on the pseudocode, as shown in Table 3.The manual photography mode requires that the data transmission client receive and transmit photography instructions, and data collection is carried out according to them.The corresponding pseudocode is shown in Table 4.  front end, and ImageController is used to transmit the image-related data of the UAV to the front end and automatically take photos.

Front end of UAV management
The system uses Google's Angular framework to develop the front-end system.The front-end system uses the Antd front-end user interface (UI) component library.Its pseudocode is given in Table 6.
To display the results and realize real-time interaction, a front-end management system is designed to display the status of the UAV in real time.The corresponding image-related data that are the results of detection as well as positional information can be simultaneously obtained.By targeting the functions of UAV display, image display, and automatic photography of the front-end system, we develop an interface containing a menu showing the home page, a page displaying the list of UAVs, and a page displaying details of the images acquired by the UAV.
Embedding the <uav-home> custom tag component, implemented by the <img> tag located in the 'Other component areas' in Table 6, allowed us to generate the home page of the system.It is the portal of the system for function navigation.The main interface of the system is shown in Figure 11.The home page contains three areas: a menu for navigation at the top, a display area in the middle, and an area displaying system information at the bottom.
After selecting UAV registration in the menu area of the home page, the user can enter the UAV display page as shown in Figure 12.The embedded <uav-status> component in it is implemented by the <button>, <input>, and <table> tags, as shown in the 'Other component areas' of Table 6, to generate the UAV display page.
This page has two parts.The first is the search area of the UAV obtained by integrating the input box and the button.UAVs can be searched for according to their name and IP.Multiple UAVs registered with the UAV management server can thus be quickly searched for and located.The second part of the page is the area displaying the list of UAVs.It lists the UAV name, IP, heartbeat time, and status.They are used to represent the management of the UAV by the system to the user.
Once a UAV has been selected on the display page, the user clicks the operation bar to view its data and then enters the page listing details of the images captured by the UAV, as shown in   6.This yields details of the images acquired by the UAV.
The page displaying details of the images acquired by the UAV is divided into two parts.(1) The first part is the operation area that contains the 'Take photo' button while displaying brief information on the UAV being used, and (2) the second part is the area of data display that shows a thumbnail of the image acquired by the UAV.The user can click it to view the results of target detection and the GPS information of the original image.

Experiment and analysis of the UAV-RTDS using a case study
The key technologies and implementation of the UAV-RTDS have been introduced above.In theory, this system can be applied to detect any target after obtaining its trained weight file.To verify the performance of the system, we tested it in the field on the task of vehicle detection.Compared with similar studies (Amato et al. 2019;Ringwald et al. 2019;Ammar et al. 2021;Gupta et al. 2022), ours focuses on system integration and testing based on performance indices.

Dataset generation and the weight file for training
We used the DJI Phantom 4 Pro four-rotor UAV to collect data on vehicles on the ground.The flight altitude was set to 40-50 m, and covered five parking lots on the Qingshuihe Campus of the University of Electronic Science and Technology (UESTC).The size of the collected images was 5472 × 3078 pixels, and they were divided into 1024 × 1024-pixel images for training.A total of 3,276 training images and 655 test images were obtained after manual screening.We used the LabelImg open-source image annotation tool to annotate each vehicle in each image and generated extensible markup language (XML) files for YOLOv4.Further, we modified the category file, configuration file, header file, and other codes corresponding to YOLOv4, and then downloaded the pre-training weights.Finally, we executed 10,000 iterations on the high-performance platform on the ground to obtain the final trained weight file.We trained our model using the Adamw optimizer (Loshchilov and Frank 2018), with an initial learning rate of lr = 0.001, weight decay of wd = 0.00001, cosine decay, and a batch size of 32 on a workstation equipped with an 8G RTX 2080 GPU.
During training, the loss declined and the AP increased, which suggests that the K-YOLOv4 algorithm had learned to detect vehicles.The relationship between the number of iterations and the loss of K-YOLOv4 is shown in Figure 14.The experiments detailed below were carried out using the above dataset and the corresponding trained weight file.

Test of algorithm optimization based on K-Means
To evaluate the performance of the K-YOLOv4 algorithm, we tested its performance in comparison with the Faster R-CNN, YOLOv3, and YOLOv4 algorithms on our experimental data.The results are listed in Table 7.
Table 7 shows that the YOLO serial algorithms had significantly higher values on all indices than the Faster RCNN algorithm.The K-YOLOv4 algorithm had an AP that was higher than those of the YOLOv3 and YOLOv4 algorithms by 2.88% and 0.37%, respectively, a recall rate that was higher by 1.27% and 0.47%, respectively, accuracy that was higher by 0.80% and 0.13%, respectively, and a better rate of missed detection by 1.27% and 0.47%, respectively.
We also used the open-source dataset COCO to compare the algorithms on the same dataset (Table 8).The results show that the AP value of K-YOLOv4 was 0.3% higher than that of YOLOv4 and its AP 50 value was 0.5% higher.The K-YOLOv4 algorithm significantly reduced the number of incomplete detections of the target.We compared the original YOLOv4 algorithm with that after optimization, i.e. the K-YOLOv4 algorithm, in terms of identifying the target in the same image using the same weights.The results are shown in Figure 15.The YOLOv4 algorithm optimized by the K-Means algorithm could detect  more targets, and significantly reduced the number of incomplete detections while improving the overall confidence of the results.

Algorithm optimization based on TensorRT
TX2 contains a GPU computing unit that makes it convenient to parallelize the serial algorithms of YOLOv4 by using the Compute Unified Device Architecture (CUDA).Twenty images of three sizes; i.e. 512 × 512 pixels, 1024 × 1024 pixels, and 2048 × 2048 pixels, were selected for testing on the NVIDIA Jetson TX2 platform.The algorithms considered here included the CUDA acceleration algorithm of YOLOv4 and K-YOLOv4.The results of the times taken by them are listed in Table 9.According to Table 9, there was little difference in time consumption between YOLOv4 and K-YOLOv4 under CUDA acceleration.The K-YOLOv4 algorithm optimized by TensorRT reasoning was compared with the serial-parallel K-YOLOv4 algorithm under the same conditions.The results are shown in Table 10.
Table 10 shows that the optimized algorithm accelerated by TensorRT-based reasoning performed well on TX2, and the times taken by it on images of different sizes were also different.When the image size was not 1024 × 1024 pixels, it took a long time to process it because the CNN needed to sample the input images by down-sampling the large images and up-sampling the small ones.That is to say, when the image sizes were 512 × 512 pixels or 2048 × 2048 pixels, up-or down-sampling was required, and the extra time needed for this process did not decrease using TensorRT optimization.Although the extra time taken by up-or down-sampling was short, when the overall elapsed time of the detection is also very short, the difference of the overall elapsed time between the image size of 512 × 512 pixels or 2048 × 2048 pixels with the image size of 1024 × 1024 pixels, will be look like 'greatly different.'Therefore, selecting images with the appropriate size for detection will effectively shorten the detection time.
The process of accelerating the K-YOLOv4 algorithm based on TensorRT reasoning was consistent with the YOLOv4 algorithm such that the algorithm maintained a suitable balance between  accuracy and speed.We compared K-YOLOv4 with other YOLO algorithms that are applicable to TX2 on the same data (image size was 1024 × 1024 pixels), and the results are shown in Table 11.
If the running times of the serial program and the parallel program are expressed by T s and T n , respectively, the speedup S p can be calculated by: According to the test data listed in Table 10 and Equation (3), the parallel and optimized acceleration ratios (speedup) of the K-YOLOv4 algorithm for images of different sizes on the TX2 development board were obtained as shown in Figure 16.
Figure 16 shows that the acceleration ratio of the CUDA parallel K-YOLOv4 algorithm was between four and five, while that of the K-YOLOv4 algorithm optimized by TensorRT reasoning was 62-80.TensorRT reasoning-based optimization thus led to a considerably higher acceleration with images of different sizes and significantly reduced the detection time of the TDM.

Experiment on vehicle detection
We also tested the entire process of photography, detection, transmission, display, and control of the proposed framework.The configuration of the software environment is shown in Table 12.The user can use any device with a browser installed on it to access the server.
The experiment was conducted in a parking lot behind the School of Physics on the Qingshuihe Campus of UESTC.The experiment was divided into two groups.A static vehicle was detected by the UAV while hovering 30 m above ground, and a moving vehicle was detected by it at a height of 60 m.The process used an orthophoto angle of view, and the image size was 3840 × 2160 pixels.An overview of the experimental site and the experimental devices is shown in Figure 17.

Detecting a static vehicle
The embedded equipment of the system was powered by the UAV.After ensuring the normal operation of the UAV, we turned on the TX2 development board so it and the server were on the same LAN.Having ensured the correctness of the above process, we started the UAV management server and the UAV client.We then entered the top menu bar of the front-end management system's home page, 'UAV Management > target detection > My UAV,' to view the status of the UAV.Its name, IP, registration time, and heartbeat time are shown in the table in Figure 12, and indicate that the UAV had been correctly registered to the server.The UAV management system could thus normally manage it.
We planned a route for the UAV at an altitude of 30 m by using the DJI app, and initiated automatic photography by it (the control program on the development board had been set-up and image data were obtained every 10 s) to collect the images.During its operation, we could click the 'View data' link in the operation bar to view the results of detection and the GPS information corresponding to the image, as shown in Figure 13.
If the user needed to view the details of an instance of detection, they could click the thumbnail of the image to enlarge it.The results of detection of the stationary vehicle obtained by the UAV hovering at a fixed point are shown in Figure 18.
We then verified the manual photography function of the proposed framework.Its operation is shown in Figure 13.During it, the user could click the 'Photographing (Take Photo)' button (located in the upper-right side of Figure 13) to execute the manual photography mode.
The image of a target obtained by manual photography is shown in Figure 19.A total of 87 results were obtained in the static detection experiment, with an overall confidence of 89.6% and a rate of missed detection of 3.8%.

Detecting a moving vehicle
We then increased the altitude of the UAV to 60 m, and used it to detect moving vehicles on the road as well as parked vehicles on the roadside.The results are shown in Figure 20.
A total of 66 photos were taken in the experiment on dynamic detection, with an overall confidence of 88.2% and a rate of missed detection of 4.6%.
The above experiments show that the UAV-RTDS ran smoothly and interacted well with the components of the system.After targeted optimization, the YOLOv4 algorithm delivered accurate results of the detection of dynamic and static targets at different distances.The entire system achieved the expected effect in application, particularly in terms of availability and stability.

Conclusions and future work
In this study, we designed and implemented a real-time target detection system for UAVs on an embedded platform, and used vehicle detection as an example to verify its performance and stability.The entire system can be used as a development paradigm and transferred to Note: YOLOv4-MobileNetV3 used MobileNetV3 (Howard et al. 2019) in place of the backbone network CSPParknet53 of YOLOv4, while YOLOv4-Tiny is the official version of YOLOv4.
Figure 16.Ratio of acceleration before after the optimization of the K-YOLOv4 algorithm.other UAV applications, such as pedestrian detection, change detection, and surface classification.However, the proposed system is not yet fully practical.It takes about 20-30 s (depending on such factors as the image size, the mode of operation of TX2, and the transmission network) from sending instructions to visualizing the results, and this can be regarded only as quasireal-time processing.Certain factors have not been considered in the system and further tests of it are needed in different scenarios.The following issues need to be dealt with in future research: (1) The management system should be further improved.(2) The DTM can be improved to operate over longer distances and at higher transmission speeds.For instance,  the use of long-term evolution technology for data transmission should be explored.(3) An automatic, concise, and widely applicable target detection system that can be applied to many kinds of targets should be developed.(4) The TX2 needs to be powered by the DJ M600 Pro during processing.The average power of the former is 5.6-10.3W while that of the latter is 188-370 W.Although the power of the TX2 for 1.5%-5.5% of that of the DJ M600 Pro, this can reduce the flight time of the UAV.
the You Only Look Once (YOLO) detection algorithm.It can use images of the target to predict its position and category through a neural network, but its detection accuracy is poor.In 2016, Liu et al. proposed the single-shot multi-box detector (SSD) algorithm, which is an improved version of the YOLO detection algorithm and the VGG-16 deep convolution neural network (Simonyan and Zisserman 2014), to extract a multi-scale feature map and directly output the location of the target of detection.In 2018, Redmon et al. proposed the YOLOv3 algorithm for target detection.It has a high accuracy and speed of detection.The YOLOv4 algorithm was released in April 2020 as a leader in deep learning.It heralded a new era in target detection owing to its high speed and accuracy of detection as well as its suitability for use with embedded devices.

Figure 1 .
Figure 1.Composition of the hardware of the UAV-RTDS.

Figure 2 .
Figure 2. Overall framework and operational principle of the UAV-RTRS.

( 4 )
Enter the distances into the set, and compare the distances between each bounding box and the fixed box.Further, select a combination of the minimum distance and assign the bounding box to the fixed box.Record the bounding box contained in each fixed box.(5)Calculate the median height and width of the bounding box contained in each fixed box obtained from step (4), and use them as the new size of the fixed box.(6) Repeat the operations in steps (4) and (5) until the size of the fixed box does not change.

Figure 6 .
Figure6.Sequence diagram of the interaction between client and server.

Figure 7 .
Figure 7. Client end of the DTM based on the producer-consumer model.

Figure 8 .
Figure 8. Structure and composition of modules of the UAV MSS.

Figure 9 .
Figure9.Flowchart of the automatic and manual photography modes.

Figure 13 .
Figure 13.The embedded <button>, <div> tags, and <uav-preview> components are implemented by the open-source preview component <lightbox>, as shown in the 'Other component areas' of the pseudocode in Table6.This yields details of the images acquired by the UAV.The page displaying details of the images acquired by the UAV is divided into two parts.(1)The first part is the operation area that contains the 'Take photo' button while displaying brief information on the UAV being used, and (2) the second part is the area of data display that shows a thumbnail of the image acquired by the UAV.The user can click it to view the results of target detection and the GPS information of the original image.

Figure 12 .
Figure 12.Page displaying the list of UAVs.

Figure 13 .
Figure 13.Page displaying details of the images acquired by the UAV.

Figure 14 .
Figure 14.Changes in the loss value of the K-YOLOv4 algorithm with increasing number of iterations.

Figure 15 .
Figure 15.Comparison of the results of detection of YOLOv4 before and after optimization.

Figure 17 .
Figure 17.Overview of the experimental site and the experimental devices.

Figure 18 .
Figure 18.Fixed-point detection by the UAV at a height of 30 m.

Figure 19 .
Figure 19.Target detection by the UAV based on the manual photography mode at an altitude of 30 m.

Figure 20 .
Figure 20.Detection of moving vehicle by the UAV based on automatic photography at a height of 60 m.

Table 1 .
Comparison of performance between TX2 and an ordinary PC.

Table 2 .
Comparison of times taken by the original YOLOv4 algorithm on an ordinary PC and the TX2.
Figure 3. Flowchart of optimization of the YOLOv4 algorithm based on the K-Means algorithm.

Table 5 .
Pseudocode for the implementation of the GPS module.Figure 10.Concept diagram of the MVC mode.

Table 6 .
Pseudocode of the implementation of the front-end page.

Table 7 .
Performance of K-YOLOv4 in comparison with other algorithms on the same dataset.

Table 8 .
(Ren et al. 2015;Bochkovskiy, Wang, and Liao 2020)ms on same dataset.Note: The input characteristic diagram of all models during training was 416 × 416, and the training parameters of K-YOLOv4were the same as those in the text.Except for those of K-YOLOv4, all other results are from the corresponding original papers(Ren et al. 2015;Bochkovskiy, Wang, and Liao 2020).

Table 9 .
Comparison of CUDA acceleration times between the YOLOv4 and the K-YOLOv4 algorithms.

Table 10 .
Comparison of times consumed by the K-YOLOv4 algorithm before and after optimization.

Table 11 .
Comparison of the precision and speed of the optimized proposed method with K-YOLOv4, YOLOv4-Tiny, and YOLOv4-MobileNetV3.

Table 12 .
Software configuration of the test environment.