Learning-based risk assessment and motion estimation by vision for unmanned aerial vehicle landing in an unvisited area

Abstract. We proposed a vision-based methodology as an aid for an unmanned aerial vehicle (UAV) landing on a previously unsurveyed area. When the UAV was commanded to perform a landing mission in an unknown airfield, the learning procedure was activated to extract the surface features for learning the obstacle appearance. After the learning process, while hovering the UAV above the potential landing spot, the vision system would be able to predict the roughness value for confidence in a safe landing. Finally, using hybrid optical flow technology for motion estimation, we successfully carried out the UAV landing without a predefined target. Our work combines a well-equipped flight control system with the proposed vision system to yield more practical versatility for UAV applications.


Introduction
Unmanned aerial vehicles (UAVs) are widely used in many fields, from military to civilian to commercial. For the sake of efficiency and convenience, a higher degree of autonomy is required to minimize human intervention. Many efforts have been made to develop vision-based technologies for UAV maneuvers. One challenge of this technique lies in how to land a UAV on a previously unvisited area. Generally, the issue can be divided into two parts: identifying a flat area for safe landing (landing risk assessment) and landing accurately on an unknown spot (motion estimation).
For the case of landing site selection, the goal is to find a planar surface with a small slope that is free of obstacles, which can be conducted through assessing the landing risk by either constructing an elevation map or evaluating the planarity of terrain appearance. The former approach shall build a full three-dimensional (3-D) geometry 1 with the corresponding coordinates of the environment by means of a sequence of images, e.g., structure from motion (SFM). [2][3][4][5] As a consequence of estimated topographical information, a two-dimensional elevation map was thus obtained to comprehend the region flatness by, for instance, least median squares 6 or plane fit. 7 Incorporated with the coordinates about the surrounding, the information is useful for the likely landing site. SFM and related methods enable an imagebased 3-D scene as well as determine the safety landing site, but with the cost of heavy computation. In order to directly obtain the absence of obstacles for the landing merit, alternatively, another incomplete but effective scheme was used. The planar area can be extracted through the homography estimation. 8,9 Without the need for a region extraction process, the roughness estimation from the optical flow field was proposed to measure the planarity of the surface. 10 In addition, researchers addressed the learning process to yield more practical versatility for UAV applications, such as supervised learning for texture classification, 11 neural network policy for navigation, 12 and deep reinforcement learning for marker recognition. 13 For the aspect of landing risk, a self-supervised learning (SSL) method was employed to overcome the constraint that significant movement is required for optical-flow-based roughness estimation. 14 After determining an adequate landing site in an unvisited area, the follow-up is to complete the landing process guided by either the positioning system [e.g., global positioning system (GPS)] or the vision-based motion estimation system. In terms of the vision-based landing scheme, several patterns are designed as markers to tackle the close range and nighttime detection problem during UAV descent. [15][16][17][18] Moreover, while landing on a moving target, schemes either optimizing the marker detection rate 19 or exploiting the moving target's dynamic model were developed accordingly. 20 However, the performance of the aforementioned schemes mainly relies on the specific target pattern and is unlikely to be applied in an unvisited environment where there is no chance to set a welldefined landing guide in advance.
Our team aims to develop a fully vision-based system for UAV landings in a previously unvisited environment. Resuming our previous work on vision-based landing motion estimation, 21 we further integrated the vision system with the learning algorithm. The major function of the proposed system is to classify the obstacle appearance on the ground and provide an accurate measure of motion during the landing. To achieve these aims, we introduced the SSL to model the relationship between visual appearance and surface roughness and developed a classifier to determine if the land is safe for landing by recognizing the predicted roughness (yes/no question). Moreover, the hybrid optical flow scheme was also employed to ensure the motion estimation throughout the entire landing process without prior knowledge of guiding markers. The remainder is organized as follows. In Sec. 2, we explain the concept of roughness estimation as well as the methodology for landing site identification. In Sec. 3, we introduce the hybrid framework for visual motion detection, including the multiscale strategy for positioning to tackle the field-of-view problem during descent. Afterward, experimental verification is given in Sec. 4. Finally, conclusion and future work are drawn in Sec. 5.
2 Learning of Obstacle Appearance SSL is a classic approach that uses input signals as the sources for supervision. Instead of human intervention, the training labels were determined by the collected data. Therefore, to learn the obstacle appearance in view, we must gather the visual cues as the input objects and the surface planarity as the corresponding supervised output values. Figure 1 shows the process of the learning algorithm for the obstacle appearance. Phase I: since the area is previously unvisited, we first navigated the UAV to capture the images as the clustering dataset. The texton dictionary 22,23 was then built as an attribute of the surface texture features. Phase II: we collected the training data, including the surface roughness measured from the optical flow field, and the texton distribution formed by matching the randomly selected patches and the labeled textons. We used the regression to model the relationship amid the surface roughness and the texton distribution. Phase III: after completing the learning step, the UAV would have the capability to identify obstacles in a still image through the predicted roughness, thereby ensuring a safe landing on the unvisited area. In order to save computational effort, the imaging algorithms were merely effective in the region-of-interest (ROI) of the input image stream. Details will be explained in the following content.

Patch Operation for Visual Appearance
In this study, we used the texton method 24 to attribute the visual appearance of an unvisited airfield. By clustering the characteristic values from multiple image patches, we can generate a texton dictionary that represents the surface texture features. In our implementation, each 3 × 3 image patch was rearranged into a 1 × 9 vector x with its grayscale values. Then these vectors were partitioned into k sets by a K-means clustering algorithm, 25 where each vector belongs to the cluster with the nearest mean. The objective function can be defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 3 2 6 ; 6 2 8 (1) where μ i is the mean of observations in cluster S i . With this method, the cluster centroids formed a visual dictionary for the unvisited area. After creating the texton dictionary, the rendering features from images can be characterized by a texton probability distribution, as shown in Fig. 2. For each randomly extracted patch, we searched for the closest match in the dictionary based on the Euclidean distance and added it to the corresponding bin in a histogram. Finally, we obtain the texton distribution q by normalizing with the number of extracted patches m as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 2 ; 3 2 6 ; 4 5 6 where Fig. 1 The overview of learning the obstacle appearance. Phase I: creating a texton dictionary for the unvisited area based on the captured images during navigation. Phase II: collecting the training data, texton distribution, and surface roughness through the image sequence in the second navigation. Phase III: hovering above the test area (red star) and predicting the surface roughness using the SSL method.

Surface Roughness using Optical Flow Algorithm
In order to determine if the selected spot is suitable for landing, we estimated the surface roughness as the merit of the safe landing. The concept of roughness estimation is to regard the optical flow components as a set of points for a plane fitting problem, where the fitting error was adopted as the measure of roughness. Based on previous research in Ref. 10, the camera model based on the optical flow algorithm shall satisfy the following conditions: (1) downwardlooking camera, (2) planar surface in sight, and (3) known angular rates of the camera. Under these assumptions, the optical flow vectors can be generalized as follows: E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 5 ; 3 2 6 ; 6 5 3 where u and v are the optical flow vectors in the x and y directions of the image coordinates system, respectively. u 0 , v 0 , and w 0 are the corresponding velocities in the x, y, and z directions scaled with respect to the altitude. α and β are the tangents of the slope angles of the surface. According to Eq. (5), the magnitude of the optical flow would be inversely proportional to the flight height above the surface. Therefore, we can estimate the surface roughness by fitting the optical flow field. Since the UAV moved nearly laterally, Eq. (5) can be simplified to Eq. (6). The parameter vectors p u and p v can be calculated separately by solving a linear fitting problem within a random sample consensus (RANSAC) procedure.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 6 ; 3 2 6 ; 4 6 5 RANSAC iterations gave us the estimated surface plane and the corresponding fitting error, serving as the measure of surface roughness (Fig. 3). If there exists any obstacle on the surface, the procedure would lead to a higher fitting error in u, v, or both directions. Consequently, we combined the results in both directions as the overall surface roughness.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 7 ; 3 2 6 ; 3 5 3 Fig. 2 After the images were split into small patches, the texton dictionary was created by K -means clustering algorithm (k ¼ 30). Then the texton distribution was obtained through a series of patch matching. Fig. 3 An example of roughness estimation: in each RANSAC iteration, the parameter vectors were calculated by fitting a plane to a randomly selected subset of data points. The procedure returned the optimal model parameters as well as the fitting error, which can be interpreted as the measure of surface roughness.
Journal of Electronic Imaging 063011-3 Nov∕Dec 2019 • Vol. 28 (6) 2.3 Self-Supervised Learning In the previous section, we introduced the concept of roughness estimation using the optical flow technique. However, since the fitting dataset consists of velocity vectors, the roughness estimation requires significant movement to guarantee a moderate result, which is not viable in the hovering mode. In order to ensure the accuracy of obstacle prediction, we proposed an SSL scheme to map visual appearance features q to roughness values ε opt . In this study, K-nearest neighbor regression 26 was used as the learning method due to its simplicity and flexibility. The algorithm is a nonparametric method that keeps all available data and predicts the numerical response based on the proximity measure. A dataset of training samples is given as follows: The algorithm performed predictions by calculating the similarity between the input sample q 0 and each instance of the training data. Finally, the predicted roughness came out with the mean of K neighbors' responses.
E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 9 ; 6 3 ; 4 7 6 ε SSL ¼ where N K denotes the K objects in the database that are closest to the input q 0 . After completing the learning process, we can estimate the surface roughness corresponding to any input distribution through the regression model.

Vision Motion Estimation for Landing
Once the landing site was selected, the UAV was commanded to descend steadily until touchdown. For visionbased landings, in the case of an unvisited environment, there exist some possibilities that the marker-based vision system might fail to recognize the marker feature. Therefore, we introduced a hybrid framework to select the processing image frames and algorithms based on the required estimations, thus meeting the need to land in an unvisited environment with no target recognition required. In addition, an augmented phase correlation (PC) method with a multiscale strategy was introduced to tackle the problem of scale variation resulting from the field-of-view change during descent.

Hybrid Optical Flow Technology
The hybrid optical flow technique, correlating multiple image frames, was proposed to measure two dynamic motions for landing controls: velocity and position. As shown in Fig. 4, the velocity estimation is determined by comparing two consecutive frames (I N − I N−1 ). On the other hand, the position information is computed by the deviation between the reference frame and the N'th frame (I N − I 0 ) to avoid integral error accumulations. The image processing in the part of motion estimation was also shown in Fig. 4. The vision system generated the guidance information with the flight data from the control system. Following our previous work, 21 we dynamically adopted the Gunnar-Farnebäck algorithm 27 and PC method [28][29][30] to obtain the velocity and position, respectively. These two algorithms worked together to combine the advantages of dense and sparse optical flow in terms of accuracy and robustness. Both the velocity and position measurements were employed as the feedback signals to the flight control system.

Multiscale Strategy
The typical PC approach could only tolerate a small range of scale difference in between two image frames. During the landing process, however, the vision-based motion estimation would experience a large-scale difference due to the decrease in height. Accordingly, in the previous work, we introduced a multiscale strategy to autonomously adjust the sensed ROI and update the reference ROI for relative position estimation. Figure 5 shows the concept of the multiscale PC method. First, the vision system set the reference ROI (I 0 ) at the instant that the UAV was commanded to descend. As the UAV was descending, the size of the ROI extraction was enlarged by a scale factor λ. Then the sensed ROI (I N ) was carried out by resizing with the same parameter to maintain the same scale as the reference ROI. Finally, the sensed ROI and the reference ROI were applied to the PC function. The factor λ can be computed by the height of the UAV.
where H ref and H are the flight heights of the reference image and the sensed image, respectively. In our experiments, the flight height was obtained by a laser altimeter. In addition, while the UAV was descending, an instantly updating reference ROI image was necessary to ensure that it contained a sufficient overlap region for ROI extraction in subsequently captured images. The reference ROI was autonomously updated when λ reached a preset threshold value. It is noted that the position estimate was carried out Fig. 4 The conceptual framework of visual motion estimation for landing: the velocity is obtained by comparing image patterns on two consecutive frames based on the Grunnar-Farnebäck algorithm, and the position is computed by the proposed multiscale PC method with respect to a reference image.
Journal of Electronic Imaging 063011-4 Nov∕Dec 2019 • Vol. 28 (6) with the prior measurement as an initial condition when passing through the next epoch of the reference ROI.
Augmenting the PC method with the proposed multiscale strategy, we effectively minimize the sensing error, which was proved by our experimental results.

Experimental Results
In this section, we reveal the experiments of vision-aided landing in a previously unsurveyed area. First, we compared the SSL method with the optical flow method in the roughness estimation. Then we conducted the UAV landing with a hybrid optical flow scheme. It is noted that the learning process was carried out before these experiments. The testing UAV was a commercial quadrotor Stellar 1000× from InnoFlight™ as shown in Fig. 6. It was preequipped with a flight control system (InnoFlight™ Jupiter JM-1 Autopilot), an inertial measurement unit, a laser altimeter, and a GPS module, respectively. In addition, for visual equipment, the image processing system included a NVIDIA™ Jetson TK1 module and a GoPro™ HERO 3 þ camera. The embedded program in the vision computer executed the image processing algorithms and carried out communications with the flight control computer, which ran the proportional-integral-derivative control scheme.

Obstacle Detection using Self-Supervised
Learning In this experiment, we hovered the UAV above several spots (marked by the red star in Fig. 1), 10 s per spot, and examined the capability of the vision system for obstacle detection. Area 1 and area 2 contained obstacles for buildings and cars, respectively, whereas the other two spots (area 3 and area 4) had no obstacles. We computed the mean and the standard deviation of estimated roughness from the optical flow algorithm (ε opt ) and the SSL method (ε SSL ) and inferred the classification rule for a safe landing site based on these results. Figure 7 shows the estimated results of surface roughness while the UAV was hovering. As we predicted in Sec. 2.3, the accuracy of surface roughness estimation via optical flow   Fig. 7, with permission). In addition to the preinstalled flight control system, the quadrotor is equipped with a vision computer and a HERO 3 external camera.
Journal of Electronic Imaging 063011-5 Nov∕Dec 2019 • Vol. 28 (6) highly depends on the UAV movement extent. Due to the absence of lateral movement, the optical flow vectors would fail to reveal the magnitude difference that is subject to surface fluctuation. As can be seen from the results of the optical flow method (top-left figure), the roughness estimation failed to cluster, and thus had poor determination in terms of classification. In contrast, the roughness predicted by the SSL method provided a better evaluation (bottom-left figure). The roughness distribution can be entirely classified into two groups, corresponding to regions with and without obstacles. In addition to the clear threshold (ε SSL < 0.17) that the proposed vision system can identify a safe landing area (with low roughness value), the distribution exhibits that the larger the area occupied by the obstacles, the higher the roughness value.

Landing Controls with Hybrid Motion Estimation
In preparation for landing (hovering phase), the UAV was commanded to hold its position above the likely landing spot. After confirming no obstacles in view, the UAV then started to steadily descend with visual feedback using the hybrid optical flow technique. Although the visual motion estimation was in effect, the system also collected the GPS data simultaneously as the benchmark. In this work, the position accuracy was verified using the template matching method 31,32 under multiple flight trials. A video demo can be found in Ref. 33, whereas Figs. 8(a) and 8(b) show the flight data and the in-plane route during the landing process. In terms of velocity estimation, the precision of the vision-based method was comparable with the state-of-the-art GPS, with only a 0.1 m∕s difference in both x and y directions. The vision-based landing was activated at P 0 (black cross), i.e., the target spot set at the coordinate origin. At the end of the landing, the UAV was located at P vision (blue dot) and P gps (green dot) according to the sensing value of the vision system and GPS, respectively. The corresponding images of camera view were also presented in Fig. 8(c). For the position part, we used the template matching method to authenticate the landing accuracy. The detected location P g (red cross) by template matching was considered as the ground truth of positioning accuracy, and its coordinate was also indicated in Fig. 8(b). In terms of overview, the visionbased landing resulted in ∼0.1 m of in-plane positioning error. These results suggested that the hybrid vision-based  Journal of Electronic Imaging 063011-6 Nov∕Dec 2019 • Vol. 28 (6) scheme is able to guide the UAV landing precisely without prior information of a particular marker.

Conclusions
In this paper, we proposed a vision-aided system to aid the UAV landing in an unsurveyed environment. The overall procedure involved identifying the safety of the landing spot and completing the landing at that location. To assess the landing risk, the system used an SSL algorithm to construct the regression model for roughness estimation. The texton distribution formed by patch matching was used to represent the visual features in view, and the concept of roughness was used to determine whether the ground underneath was a safe landing site or not. For a newly acquired distribution, the proposed system obtained the roughness information through the regression model and further classified the presence of obstacles. Compared with the pure optical flow method, the SSL method allowed the UAV to estimate the roughness in the hovering phase, which is more practical in UAV landing operations. Then we applied the visual motion estimation framework for landing as proposed in our previous work, including a multiscale strategy that could tackle the problem of scale variation during descending. With this method, we can land a UAV with no well-defined target required on the ground. In addition, the experimental results indicated that we successfully carried out the vision-based autonomous landing with a positioning error of ∼0.1 m. The detailed discussion of landing performance can be found in our previous work. 21 To enable the vision system more completely in UAV applications, more effort shall be paid to tackle the problem of visually retrieving depth information. Moreover, online learning would make the UAV more versatile by autonomously selecting a safe landing spot. In this way, a fully vision-based system can be employed to implement UAV autolanding in an unsurveyed environment.