AERIAL VEHICLES DETECTION AND RECOGNITION FOR UAV VISION SYSTEM

This article focuses on aerial vehicle detection and recognition by a wide field of view monocular vision system that can be installed on UAVs (unmanned aerial vehicles). The objects are mostly observed on the background of clouds under regular daylight conditions. The main idea is to create a multi-step approach based on a preliminary detection, regions of interest (ROI) selection, contour segmentation, object matching and localization. The described algorithm is able to detect small targets, but unlike many other approaches is designed to work with large-scale objects as well. The suggested algorithm is also intended to recognize and track the aerial vehicles of specific kind using a set of reference objects defined by their 3D models. For that purpose a computationally efficient contour descriptor for the models and the test objects is calculated. An experimental research on real video sequences is performed. The video database contains different types of aerial vehicles: airplanes, helicopters, and UAVs. The proposed approach shows good accuracy in all case studies and can be implemented in onboard vision systems


Introduction
The unmanned aerial vehicles (UAVs) went through an intensive development period in the last decade.In many applications they proved their reason for existence.Nowadays the appearance of large numbers of UAVs raise new problems such as autonomous navigation, early object detection and recognition, 3D scene reconstruction, collision avoidance.It should be noted, that previously a lot of tasks were solved by radar-based systems.They are reliable, but, unfortunately, can't be installed on the small aerial vehicles because of high weight, size and energy consumption.So the attention of researches is attracted by high resolution image sensors.This paper is devoted to aerial vehicles detection and classification by wide field of view monocular vision system.This is essential for collision avoidance, autonomous drone swarm deployment, airspace patrol and monitoring, in security applications.
A typical image of an aircraft at a range of several kilometers is only a few pixels in diameter.A part of the challenge is detecting such small targets in low signal to background ratio.On the over hand, objects size constantly grows as it approaches.Large object looks contrast but have too much detail that can lead the poor quality of object parameter estimation.The example of observed types of aerial vehicles is shown in Fig. 1.
Well-known object detection algorithms are not always invariant to the scale transform and are used primarily for small target detection [2,3].However, some relatively recent research efforts look promising [4,5].The alternative approaches based on algorithm switching already applied for ground object detection, for example [6].
However, the development of more reliable algorithm for early object detection and confident recognition under different observation conditions is still an important problem.
Traditionally, some different approaches are used for recognition purposes, including hidden Markov models, feature points, tangent/turning functions, curvature maps, shock graphs, Fourier descriptors, etc. [4].

Fig. 1. Different types of observed aerial vehicles
They have their benefits and drawbacks, regarding computational complexity, precision capabilities, implementation issues, robustness and scalability.Other edgebased approaches include Chamfer distance based methods for recognizing objects through smaller shape fragments [7].Complex algorithms based on machine learning [8] are developed actively, but they still have high computational costs.
Often aerial vehicles have homogenous brightness on the image and its shape information is more relevant.In this paper, a relatively simple shape descriptor is used.It is computationally efficient and suited for onboard systems.
This article focuses on aerial vehicles (airplanes, helicopters, UAVs) detection and recognition mostly in cloudy background conditions.The main idea is to create a multi-step approach based on a preliminary detection, regions of interest (ROI) selection, object contour segmentation, contour descriptor calculation, object matching, and recognition.
In the next section object, detection algorithm is described in detail.Then contour descriptor evaluation, object matching and recognition are discussed.We present some experimental results for proposed approach obtained on natural video sequences.

Object detection
Preliminary detection.The algorithm, that is used to detect objects at preliminary step, should satisfy two requirements.It should be computational efficient and work well in cloudy and noisy environment.It can be assumed, that objects are more contrast than the underlying background.Neighboring pixels usually have similar brightness values and background have low spatial frequencies in Fourier domain.In that case, objects with some assumptions can be describes as blobs.Spatial filters are typically used for blob detection to increase SNR and to get better results.
h m n q m n q q m n q q h m n q q otherwise After that the background estimation is subtracted from filtered with mask h1 image: Difference image d(i, j) contains objects and remaining clutter with nearly gaussian zero mean spatial distribution.It can be concluded because of the large number of pixels in the image and application of a central limit theorem.Taking into account the nature of the distribution the thresholding procedure can be used to get object binary mask b(i, j): 1, ( , ) ; ( , ) 0, otherwise, where k is a threshold coefficient and σ -standard deviation of the difference image.However, application of the blob detection procedure in practice faces with a number of problems.Disadvantages of the approach are explained by the locality of spatial processing techniques.It is clear that the size of the filter mask depends on the object size in the image.Large object is often fragmented, and it is imposable to correctly determine its shape.Besides, atmospheric turbulence and background clutter such as contrast clouds cause false detections.Next processing steps are performed to archive scale invariance, reject false detections, and refine object shape.
ROI selection.To archive invariance to scale transform gaussian image pyramid is created, and blob detection algorithm described above is performed.Binary images are formed at each scale of the pyramid.Filter masks sizes are fixed, but coefficient k slightly increases with image detail degradation.For each pyramid level binary image is formed, and list of segments is created.Segment analysis at different scales is a part of the algorithm which allows selection of regions of interest.
The analysis starts from coarse image resolution and goes to more detailed levels.Simple morphological operations are involved to reduce segment fragmentation on low resolutions.Bounding boxes for each segment are expanded on some value depending on initial size.Then intersections between bounding boxes are searching at different scales.Intersected regions must be counted and excluded from the list.As a rule, large objects in the image are more fragmented on detailed scale levels.This property is used to specify large object location.Example of binary mask of the test object on different levels of the pyramid is shown in Fig. 2.

Fig. 2. Image and binary mask at three levels of the pyramid. Black rectangle corresponds to the ROI and is used for contour segmentation
Thus, bounding boxes that are found on coarse resolution and have more intersections on higher resolutions are probably related to objects and are treated as ROIs.The size of all ROIs is transformed to one resolution scale.Remaining small segments found on original image are small targets.They can be described by its position, size and average brightness.These characteristics are used for matching based on minimization of relative differences of object properties.Such small object can be tracked and its velocity can help to increase recognition accuracy later.
In opposite, large objects on binary image can look deformed.The segment centroid is often shifted, and that leads to significant errors in recognition.To achieve better results, object shape is restored at next step by processing ROIs with contour segmentation algorithm robust to illumination changes.
Contour segmentation.At this step, the more complicated segmentation procedure is performed in each ROI to estimate object contour.We choose active contour model as a powerful and flexible approach that can be used to precisely segment object boundary.More important that this approach grants that contour will be closed and won't contain gaps.
The model is based on the Mumford-Shah functional minimization problem [9].Let's assume for simplicity that the images are continuous.The general form for the Mumford-Shah energy functional for sensed image l(x, y) can be written as where m and v are positive constants, l(x, y) -segmented image, С -object boundary curve.It becomes a difficult problem to find С since r(x, y) is also an unknown function in 2D coordinate space.Expression can be simplified if r(x, y) is a piecewise constant function that takes the value r1 inside С and r0 outside С.In that case energy functional ( 4) is reformulated as follows: ( ) Expression ( 5) describes a Chan-Vese active contour model [9,10], where first term is the energy that corresponds to expansion force; the second is the energy that tends to compress the contour.The problem is to find the boundary of the object at which equilibrium is reached between two forces.The unknown curve C is replaced by the level set function ϕ(x, y), considering that ϕ(x, y) > 0 if the point (x, y) is inside C, ϕ(x, y) < 0 if (x, y) is outside C, and ϕ(x, y) = 0 if (x, y) is on C. Finally the minimization problem is solved by taking the Euler-Lagrange equations and updating the level set function ϕ(x, y) by the gradient descent method: where r1 and r0 are average brightness values of object and background respectively, δ(ϕ) -approximation of Dirac delta function, K(ϕ) -curvature of the curve.In transition from continuous (x, y) to discrete (i, j) coordinate values equation ( 6) is transformed to ( ) .
At each n-th iteration, ϕ(x, y) is reinitialized to be the signed distance function to its zero level set.This procedure prevents the level set function from becoming too flat.This effect is caused by δ(ϕ) due to ϕ(x, y) smoothing.The iterative search stops when the number of points where level set function is close to zero ceases to vary noticeably [9].
This method can deal with the detection of objects whose boundaries are dimmed or not necessarily defined by gradient.It does not require image filtering and can efficiently process noisy images.Therefore, the true boundaries are preserved and could be accurately detected.Additionally, it can automatically detect interior contours with the choice of Dirac function approximation [10].
However, Chan-Vese model also has some drawbacks: the unsuccessful segmentation of images with significant intensity inhomogeneity, the sensitivity to the initial contour placement, and time-consuming iterative solving procedure.In this work images are segmented only in areas determined by ROIs, and are centered on objects in most cases.The influence of an image inhomogeneity on segmentation results is noticeable for large-scale objects but can be significantly reduced by image downsampling in the gaussian pyramid.Thus, the main drawbacks of the approach can be overcome.
Next subsection provides a description of object recognition step of the algorithm.

Contour descriptor calculation
The contour descriptor is the number vector that is relates to the specified object contour.It is used for decreasing of the amount of information describing the object contour.Also, the contours descriptor allows increasing the speed of the contour matching [11].
The proposed descriptor can be calculated using the object binary image b or a contour C. In the first case after the image binarization we can extract external image contour.Points of the contour are translated into polar coordinate frame with the frame center in the object centroid.Obtained vector of polar coordinates is discretized and subjected to the median filter.
The result descriptor units can be calculated using the following equation: where 1, D i N = -the number of the current descriptor unit; ND -the total number of the descriptor units; d(P1, P2) -Euclidian distance between P1 and P2; P centerthe position of the object centroid; P(α, ∆α) -any object or object contour point situated in sector of the circle that is limited by the α±∆αangles (the circle is centered in P center ); Fmed{…} -the symbolic definition of the median filtering operation.
As the object contour is a close curve, it generates the series of the descriptors that are shifted relative one to another depending of the starting angle.The descriptor with the maximal D(1) unit is used as an object descriptor.
Steps of calculating contour descriptors are illustrated in the Fig. Object matching is performed by minimizing the criterion function: where Dob -object contour descriptor found in the previous frame, , 0, is the descriptor of j object candidate, N -number of objects in the current frame, m -is the value of circular shift of a descriptor.Image rotation results in circular shifts of the contour descriptor and is taken into consideration in (9).Thus, matching process is invariant to object rotation, scale and shift.Minimum of Fcrit(j) for all j determines the most similar object.

Object recognition
The proposed object recognition algorithm consists of two stages.The first stage is reference object database preparation or learning.At this stage the reference obect descriptors are calculated using 3D object models.The reference database includes a set of descriptors calculated for a number of different object poses with different Euler angles combinations.We suggest using the geosphere principle to distribute object poses on the sphere uniformly.Since then stage includes a lot of complicated operation (as 3D model rendering), it produced preliminary [12].
The second stage of the algorithm is object recognition.It also is based on description of extracted image contour and similar to the object matching algorithm.This stage is performed in real time on the board.
The most probable pose is estimated as a result of matching the contour descriptor of query image with training descriptors.Descriptor matching is performed by calculating the criterion function: where D0 is the query image descriptor, Djis the descriptor of current training image, and s is the value of circular shift of descriptor.This criterion function provides rotation invariance of the descriptor.Index of training descriptor corresponds to a geosphere point.Therefore it determines Euler angles α and β.Let s0 to be the shift value that gives the minimum value to the expression in square brackets in (11): Hence the value of angle γ is calculated by the formula: As a result of calculating criterion function (11) for every training descriptor we get vector of values of criterion function ( ) The measure of the similarity between captured object and the k-th reference object is value Rk.It can be defined as the minimal distance between sets of the object descriptors as: The recognition is processed by finding the least value of the: arg min( ), 1, where k -index of the most similar reference object, Kthe cardinality of the reference object set.
As the suggested algorithm is based on image contour description, we meet the ambiguity problem.It happens when calculated descriptor corresponds to more than one orientation.For instance, topside and underside views of an airplane will provide equal object contours and hence descriptors.This problem must be taken in the account in case of solving orientation estimation task, but it is not important in case of recognition task solving.
Another algorithm problem is related to the defined types of the object because of some image peculiarities.The images of the helicopters often do not include the propeller.It happens, for example, then the distance if far and the light source is situated behind the observer.In this case the difference between the object contour descriptor and the reference descriptors is inaccessibly high (Fig. 4e -f).
We propose to use two different models for the reference descriptor generation.The first model of the helicopter includes the propeller, the others does not.The example images of the aircraft obtained using this approach and the relative contour descriptors are presented on Fig. 4.

Hardware implementation
Since the developed algorithm will be used in onboard vision systems, it must fit the system structure.The target systems consist of DSP or CPU as a control unit and a number of FPGAs as computing units.FPGAs are used for performing the most of "heavy" operations such as spatial and temporal image filtering, geometric and spectral transformations, template matching and thresholding, binary image marking.The DSP/CPU is used for performing unique operations with small amount of data, FPGA dispatching and internal control.

-Steps of contour descriptor calculation: a -the render image of the helicopter with propeller b -the render image of the helicopter without propeller c -the binary image of the helicopter with propeller , d -the binary image of the helicopter without propeller, e -the contour descriptor of the helicopter with propeller, f -the contour descriptor of the helicopter without propeller
The object detection algorithm was designed for the described system structure.The object recognition algorithm cannot be performed onboard completely.The learning stage of this algorithm should be performed of external PC.It includes such specific operations as the 3D object rendering.Also, the learning stage has no time restrictions.In contrast, the second stage of proposed algorithm is performed onboard in real time.The FPGA based vision system shows the most performance in case of pipelined processing.Therefore, we suggest that the algorithms are suitable for Xilinx Virtex 5 of higher FPGA based vision systems [12].

Experimental research
The first goal of research is to determine the ability of algorithm to localize objects at the distance of several kilometers.Video database contained 12 grayscale video sequences with 7 different types of aircraft, three types of UAVs and two helicopters.Object observed on cloudy environment and in clear sky conditions.These sequences were obtained from single TV or IR camera with a wide field of view.The size of objects varied from about 3×3 pixels to 200×200 and even higher.Confident detection of objects in observed images affects the quality of algorithm.The true positive ratio Pt and false negative ratio Pf are measured for fixed detection algorithm parameters.Reference object position and size are determined in each video frame by visual inspection.Additionally the standard deviation of object coordinate σc and size σs measurement error are estimated.
To get more relevant results σs is divided on reference size and expressed in percent.The results are summarized in Table 1.The algorithm is less reliable in detecting helicopters because of rotary wings that are not always distinguishable.In some cases the shape of the object varies very rapidly due to the changes of the angle of view, which also causes object misses.There are a lot of algorithms developed for aerial object detection.In [13] authors adapted Viola-Jones algorithm for aircraft detection in video sequences.The probability of true detection ranged from 84.3 % to 89.1 %, depending on background conditions.The approach developed at Carnegie Mellon University [14] is focused on detection of small size unmanned aerial vehicles at a distance of about 3 miles from the image sensor.The algorithm provides detection probability of more than 90 % with a false negative ratio not exceeding 5 %, but the algorithm is not developed for large-sized object detection.A closer analogue of the developed algorithm is a multistep approach described in [15], which provides detection and classification of aerial objects.The algorithm demonstrated high detection quality; however, test video sequences contained only aircraft on relatively low contrast background.The effectiveness of the developed algorithm is comparable with analogs, and in some cases it is possible to achieve better results.
The second goal of the experimental research was to study the performance and accuracy of the proposed image recognition algorithm in comparison with the known works.
In the work [16] the authors propose a mixed approach.They use three types of indicators and a neural network.The result true positive ratio is between 82 % and 94 %.In the work [17] the Markov random field based classificatory is used.The result true positive ratio is between 88 % and 95 %.In the work [18] the authors propose recognition of the military airplanes using wavelet-based descriptors.The result true positive ratio is about 96 %.
The experiments were carried out on the same natural image sequences that were used for the object detection algorithm examination.The minimal aerial object area was 500 pixels.The maximal aerial object area was less than 15 % of the image area.
The reference object base includes 17 objects.The objects were defined by the 3D models.The sets of the reference images were rendered for every model.The factor 3 geosphere point distribution was used (92 points).The light source was situated in front of the object.The examples of the object recognition are presented on Fig. 5.The quality of the object recognition was estimated using the true positive recognition ratio metrics.This value was averaged on the entry test video set.The results of the experiments are shown in the Table 2.

Conclusion
The proposed algorithms are suited for object detection and recognition of aerial vehicles observed on cloudy background under regular daylight conditions.Experiments show that objects can be detected with good quality at the distance of several kilometers.Accuracy of matching and recognition upon the average exceeds 90 % but depends on object type and orientation in space.The proposed algorithm is focused on computational complexity reduction, and can be used in airborne vision system installed on UAV.In future additional research work will be carried out to implement the algorithm in actual vision systems.

3 .
3. Computer Optics, 2017, Vol.41(4) Steps of contour descriptor calculation: a -the input query image of aircraft, b -binary image with the extracted contour and the example of the sector matched to the first descriptor unit, c -binary image translated in polar coordinate frame, d -external contour descriptor 3. Object matching Small targets are matched by minimizing relative differences in average brightness, size position for object candidates found in new frame.Contour coordinates are very valuable for tracking and recognition purposes for larger objects.However, information about contour coordinates is excessive and values themselves are not invariant to geometrical transformations.Therefore more relevant contour descriptors are used.

5 .
The example of the object recognition a -the object source image, b -the object binary image with external contour, c -the most similar reference object image

Table 1 .
Object detection results grouped by type

Table 2 .
Object recognition results