Artificial Marker and MEMS IMU-Based Pose Estimation Method to Meet Multirotor UAV Landing Requirements

The fully autonomous operation of multirotor unmanned air vehicles (UAVs) in many applications requires support of precision landing. Onboard camera and fiducial marker have been widely used for this critical phase due to its low cost and high effectiveness. This paper proposes a six-degrees-of-freedom (DoF) pose estimation solution for UAV landing based on an artificial marker and a micro-electromechanical system (MEMS) inertial measurement unit (IMU). The position and orientation of the landing maker are measured in advance. The absolute position and heading of the UAV are estimated by detecting the marker and extracting corner points with the onboard monocular camera. To achieve continuous and reliable positioning when the marker is occasionally shadowed, IMU data is fused by an extended Kalman filter (EKF). The error terms of the IMU sensor are modeled and estimated. Field experiments show that the positioning accuracy of the proposed system is at centimeter level, and the heading error is less than 0.1 degrees. Comparing to the marker-based approach, the roll and pitch angle errors decreased by 33% and 54% on average. Within five seconds of vision outage, the average drifts of the horizontal and vertical position were 0.41 and 0.09 m, respectively.


Introduction
With the rapid development of micro-electromechanical systems (MEMSs) and high-performance flight control processors, applications and research of multirotor unmanned air vehicles (UAVs) have received increasing attention worldwide. Due to the unique structure and stability, multirotor UAVs are widely used in precision agriculture [1,2], rescue [3], surveillance [4,5], etc. These applications also pose new challenges for the precision landing of UAVs [6]. To land safely and precisely, the localization of UAV requires high accuracy, continuity, and reliability. Global navigation satellite system/inertial navigation system (GNSS/INS)-integrated navigation [7] technologies are the most widely used navigation methods for various UAVs [8,9]. However, the precision of the integrated navigation positioning is completely dependent on the GNSS. Although decimeter-level positioning results can be obtained by a consumer-grade dual-frequency GNSS receiver with sufficient satellites [10,11], in areas with low satellite observation such as city canyons and forests, positioning errors will increase. Moreover, it is totally unavailable for indoor applications. To ease these problems, vision-based pose estimation approaches for UAV landing have attracted extensive research due to their low cost and ability to sense the environment [12,13]. A survey of the vision-based landing techniques for UAVs is presented in [14]. In most applications, e.g., precision agriculture, we predicted by INS and updated in the EKF. Field tests illustrate the magnificent performance of our system. It worth mentioning that this work only focuses on the pose estimation for the UAV landing instead of on the whole closed-loop control system. This paper is organized as follows. In Section 2, the design and detection of the marker as well as the relative state estimation method with the extracted marker is presented. Then the fusion algorithm of pose information from the marker and IMU data is described. Experimental evaluations of the proposed system are provided and discussed in Section 3. Finally, Section 4 draws some conclusions.

Marker Detection
Similar to previous works, the marker proposed in this work is a big square consisting of small black and white squares. They are distributed on a plane according to a certain rule with the advantages of large information capacity and good expandability.
The marker is constituted by N × N black and white squares of the same size. The number and size of the squares are dependent on the size of the dataset, as well as the amount of encoded information. The border of the marker is black with several black squares properly filled inside. To avoid the central symmetry of the marker, which leads to ambiguous heading definition, it must be ensured that the ultimate marker is not centrosymmetric when determining internal black squares. Our marker is specifically designed for UAV landing, reducing the computational cost while limiting false positives. Unlike the aforementioned markers, the structure inside the border of the proposed marker is much simple. To meet the requirement of estimating pose at different heights along the landing path especially when the UAV is close to the ground, a black square is set at the marker center. At the beginning, the UAV detects the whole marker and adjust its horizontal position and heading while landing. Once it descends to a certain height where the marker is too large to be detected, the center square can be used for guidance. Thus, N should be an odd number.
A marker database is consequently built for different landing points. Every marker has its own specific number and involves a different command for the UAV. We use a binary bit stream to represent the coding matrix of the marker for fast and accurate matching, with values of 0 for black and 1 for white. The (N − 2) × (N − 2) matrix uniquely represents the encoding information of the marker. Figure 1 shows examples of two different markers (N = 7) and their encoding matrices.
Sensors 2019, 19, x FOR PEER REVIEW 3 of 19 our system. It worth mentioning that this work only focuses on the pose estimation for the UAV landing instead of on the whole closed-loop control system. This paper is organized as follows. Section 2 presents the design and detection of the marker. Section 3 describes the pose estimation process with the detected marker. A fusion method of pose information from the marker and IMU data is presented in Section 4. In Section 5, experimental evaluations of the proposed system are provided and discussed.

Marker Detection
Similar to previous works, the marker proposed in this work is a big square consisting of small black and white squares. They are distributed on a plane according to a certain rule with the advantages of large information capacity and good expandability.
The marker is constituted by N N × black and white squares of the same size. The number and size of the squares are dependent on the size of the dataset, as well as the amount of encoded information. The border of the marker is black with several black squares properly filled inside. To avoid the central symmetry of the marker, which leads to ambiguous heading definition, it must be ensured that the ultimate marker is not centrosymmetric when determining internal black squares. Our marker is specifically designed for UAV landing, reducing the computational cost while limiting false positives. Unlike the aforementioned markers, the structure inside the border of the proposed marker is much simple. To meet the requirement of estimating pose at different heights along the landing path especially when the UAV is close to the ground, a black square is set at the marker center. At the beginning, the UAV detects the whole marker and adjust its horizontal position and heading while landing. Once it descends to a certain height where the marker is too large to be detected, the center square can be used for guidance. Thus, N should be an odd number.
A marker database is consequently built for different landing points. Every marker has its own specific number and involves a different command for the UAV. We use a binary bit stream to represent the coding matrix of the marker for fast and accurate matching, with values of 0 for black and 1 for white. The ( ) ( ) matrix uniquely represents the encoding information of the marker. Figure 1 shows examples of two different markers ( ) and their encoding matrices. The main process of pose estimation by the marker is shown in Figure 2. First, we perform histogram equalization and morphological processing on the captured image to reduce the noise and then execute binarization. The Otsu [30] algorithm is employed since it provides the optimal image threshold given that the pixel distribution is bimodal, which is true in this scenario. We use the Canny [31] operator to extract the contours in the image and the Douglas-Peucker algorithm to fit the contours to a polygon. After removing the unreasonable contours, all the quadrilateral contours are extracted, and the internal pixels are extracted and clustered to match the encoding matrix in the database. After determining the unique matrix, the vertices of the marker on the image are obtained, and all corners inside the marker are extracted [32]. If the entire marker cannot be extracted, the corners are tracked by calculating the optical flow [33]. When the UAV is too close to the marker, the corners of the central black square are extracted. Since the positions of the corner points are known, the position and orientation of the camera can be estimated by a set of correspondences between real The main process of pose estimation by the marker is shown in Figure 2. First, we perform histogram equalization and morphological processing on the captured image to reduce the noise and then execute binarization. The Otsu [30] algorithm is employed since it provides the optimal image threshold given that the pixel distribution is bimodal, which is true in this scenario. We use the Canny [31] operator to extract the contours in the image and the Douglas-Peucker algorithm to fit the contours to a polygon. After removing the unreasonable contours, all the quadrilateral contours are extracted, and the internal pixels are extracted and clustered to match the encoding matrix in the database. After determining the unique matrix, the vertices of the marker on the image are obtained, and all corners inside the marker are extracted [32]. If the entire marker cannot be extracted, the corners are tracked by calculating the optical flow [33]. When the UAV is too close to the marker, the corners of the central black square are extracted. Since the positions of the corner points are known, the position and orientation of the camera can be estimated by a set of correspondences between real world points and their projection on image. This is introduced in Section 2.2. Figure 3a-f presents the image processing results of the proposed system. world points and their projection on image. This is introduced in Section 2.2. Figure 3a-f presents the image processing results of the proposed system.

Pose Estimation by Marker
The relative pose between two coordinate systems can be retrieved by matching the corresponding points. The world coordinate system (w-frame) is established with the center of the marker as the origin. The camera coordinate system (c-frame) is established with the optical center of the camera as the origin, x-axis pointing to the right, y-axis pointing down. The pixel coordinate system (p-frame) is established with the upper left corner of the image as the origin and ( , ) u v parallel to ( , ) x y of the c-frame, respectively. Figure 4 shows the coordinate systems. The camera projection model is shown as Equation (1). The marker can be laid arbitrarily at the landing point, but we have to measure its heading angle in advance for UAV absolute heading estimation. For easy understanding, we assume that the x-axis and y-axis of the w-frame points to the north and east, respectively.  world points and their projection on image. This is introduced in Section 2.2. Figure 3a-f presents the image processing results of the proposed system.

Pose Estimation by Marker
The relative pose between two coordinate systems can be retrieved by matching the corresponding points. The world coordinate system (w-frame) is established with the center of the marker as the origin. The camera coordinate system (c-frame) is established with the optical center of the camera as the origin, x-axis pointing to the right, y-axis pointing down. The pixel coordinate system (p-frame) is established with the upper left corner of the image as the origin and ( , ) u v parallel to ( , ) x y of the c-frame, respectively. Figure 4 shows the coordinate systems. The camera projection model is shown as Equation (1). The marker can be laid arbitrarily at the landing point, but we have to measure its heading angle in advance for UAV absolute heading estimation. For easy understanding, we assume that the x-axis and y-axis of the w-frame points to the north and east, respectively.

Pose Estimation by Marker
The relative pose between two coordinate systems can be retrieved by matching the corresponding points. The world coordinate system (w-frame) is established with the center of the marker as the origin. The camera coordinate system (c-frame) is established with the optical center of the camera as the origin, x-axis pointing to the right, y-axis pointing down. The pixel coordinate system (p-frame) is established with the upper left corner of the image as the origin and (u, v) parallel to (x, y) of the c-frame, respectively. Figure 4 shows the coordinate systems. The camera projection model is shown as Equation (1). The marker can be laid arbitrarily at the landing point, but we have to measure its heading angle in advance for UAV absolute heading estimation. For easy understanding, we assume that the x-axis and y-axis of the w-frame points to the north and east, respectively. The transformation among coordinate systems is given as where c P and w P are the coordinates of the corner points in c-frame and w-frame, respectively; ( , ) u v are the coordinates of the corner points in p-frame; c w R and c w T are the rotation matrix and translation vector from w-frame to c-frame, respectively; Κ is the intrinsic parameter matrix of the camera, which can be calibrated by the chessboard method proposed in [34]; ( , ) x y c c is the camera principal point in p-frame; and x y f f 、 represent the focal length of the camera in terms of pixel dimensions in the x and y direction, respectively. According to the matching pairs of the corners in w-frame and p-frame, the estimation of the UAV pose ( ) problem that can be solved by [35]. The random sample consensus (RANSAC) method [36] is introduced to exclude corner outliers to ensure the robustness of the visual positioning. Then, we regard it as the initial value and perform the Levenberg-Marquardt [37] approach to optimize it according to the minimum reprojection error criterion, shown as where i n ∈ denotes the i th point of the all n points; i ρ indicates the weight factor. To reduce the influence of a corner observation with a large error on the result, we use the Cauchy loss function (Equation (4)) to reduce the weight of this observation [38].
The relative position of the camera w.r.t. the w-frame can be transformed to the absolute pose in the e-frame by the following equations, The transformation among coordinate systems is given as where P c and P w are the coordinates of the corner points in c-frame and w-frame, respectively; (u, v) are the coordinates of the corner points in p-frame; R c w and T c w are the rotation matrix and translation vector from w-frame to c-frame, respectively; K is the intrinsic parameter matrix of the camera, which can be calibrated by the chessboard method proposed in [34]; (c x , c y ) is the camera principal point in p-frame; and f x , f y represent the focal length of the camera in terms of pixel dimensions in the x and y direction, respectively. According to the matching pairs of the corners in w-frame and p-frame, the estimation of the UAV pose (R c w , T c w ) is a perspective-n-point (PNP) problem that can be solved by [35]. The random sample consensus (RANSAC) method [36] is introduced to exclude corner outliers to ensure the robustness of the visual positioning. Then, we regard it as the initial value and perform the Levenberg-Marquardt [37] approach to optimize it according to the minimum reprojection error criterion, shown as where i ∈ n denotes the ith point of the all n points; ρ i indicates the weight factor. To reduce the influence of a corner observation with a large error on the result, we use the Cauchy loss function (Equation (4)) to reduce the weight of this observation [38].
The relative position of the camera w.r.t. the w-frame can be transformed to the absolute pose in the e-frame by the following equations, δr e = Dδr n , where δr e is the position vector from the origin of w-frame to the camera center projected in the e-frame; δr n is the camera coordinates in the n-frame; R M is the meridian radius of curvature; R N is the radius of curvature in the prime vertical; ϕ is the geodetic latitude; and h is the altitude.

Visual/Inertial Fusion
INS is able to calculate UAV pose from the initial state by IMU measurements alone. By detecting the marker, the relative position and heading of the UAV in the w-frame is retrieved which provides reliable external observations for INS to correct its errors. Since the corner points are on the same plane that is parallel to the horizontal plane, the observability of the horizontal angle (i.e., roll and pitch angle) is weak for PNP. For INS, the external position assist and gravity constraint make the horizontal angle observable. Moreover, the IMU data rate is higher than that of the camera in general. INS mechanization can predict the UAV state during two consecutive image frames. Due to the magnificent complementarity, it is valuable to fuse the two sensors' data. The algorithm overview is presented in Figure 5.
where e δ r is the position vector from the origin of w-frame to the camera center projected in the eframe; n δ r is the camera coordinates in the n-frame; M R is the meridian radius of curvature; N R is the radius of curvature in the prime vertical; ϕ is the geodetic latitude; and h is the altitude.

Visual/Inertial Fusion
INS is able to calculate UAV pose from the initial state by IMU measurements alone. By detecting the marker, the relative position and heading of the UAV in the w-frame is retrieved which provides reliable external observations for INS to correct its errors. Since the corner points are on the same plane that is parallel to the horizontal plane, the observability of the horizontal angle (i.e., roll and pitch angle) is weak for PnP. For INS, the external position assist and gravity constraint make the horizontal angle observable. Moreover, the IMU data rate is higher than that of the camera in general. INS mechanization can predict the UAV state during two consecutive image frames. Due to the magnificent complementarity, it is valuable to fuse the two sensors' data. The algorithm overview is presented in Figure 5.

INS Mechanization and System Model
The main procedure of the INS mechanization can be divided into three steps: 1. Integrate the angular rate to update the vehicle attitude; 2. Transform the acceleration to the n-frame by the updated attitude, then integrate it to obtain the vehicle velocity; 3. Integrate the velocity to calculate the vehicle position.
Refer to [39] for the detailed update equations. The demands for an accurate estimation of the vehicle pose necessitate the modeling and online estimation of the error components of the sensor. The state vector of the integrated navigation system constructed in this paper can be expressed as where n δ r is the position error in the navigation frame (n-frame), which has its origin coinciding with that of the b-frame, with its x-axis pointing north, z-axis pointing down, and y-axis completing

INS Mechanization and System Model
The main procedure of the INS mechanization can be divided into three steps: 1.
Integrate the angular rate to update the vehicle attitude; 2.
Transform the acceleration to the n-frame by the updated attitude, then integrate it to obtain the vehicle velocity; 3.
Integrate the velocity to calculate the vehicle position.
Refer to [39] for the detailed update equations. The demands for an accurate estimation of the vehicle pose necessitate the modeling and online estimation of the error components of the sensor. The state vector of the integrated navigation system constructed in this paper can be expressed as where δr n is the position error in the navigation frame (n-frame), which has its origin coinciding with that of the b-frame, with its x-axis pointing north, z-axis pointing down, and y-axis completing a right-hand orthogonal frame, i.e., the north-down-east (NED) system; δv n is the velocity error in the n-frame; φ is the attitude error, composed of roll error, pitch error, and heading error; b g and b a are the residual bias error of the gyroscope and the accelerometer, respectively; s g and s a are the residual scale factor error of the gyroscope and the accelerometer, respectively. The continuous time system is modeled as, .
where, F(t) is the system dynamics matrix; G(t) is the noise distribution matrix of the system; and w(t) is the vector describing the system noise.
Due to the uncertainty in the sensors and the gravity field, the navigation parameters obtained from the INS mechanization equation contain errors. The phi-angle error model [40] is used here. Thus, the differential equation of the inertial navigation error is as follows: where, b denotes the body frame (b-frame), which has its axes coinciding with the IMU's body; δω b ib and δf b are the gyroscope measurement error and the accelerometer measurement error, respectively; ω n ie is the rotation rate vector of the e-frame with respect to the inertial frame (i-frame) projected to the n-frame; ω n en is the rotation rate vector of the n-frame with respect to the e-frame projected to the n-frame; δg n l is the gravity vector projected to the n-frame; C n b is the rotation matrix from the body frame (b-frame) to the n-frame; δr N and δr E are the position error on north and east direction of the n-frame, respectively; The variables b g , b a , s g , and s a are modeled by a first-order Gauss-Markov process.
where T is the correlation time of the process, and w is the driving white noise process.

Linearized Measurement Model
Since the IMU and the camera cannot be installed at the same place with parallel axes, the pose retrieved from the marker is different from that derived from the INS, which is known as the lever-arm effect and installation angle effect, as described in Equations (14) and (15), respectively. These misalignments have to be calibrated in advance or they will cause significant errors in both position and attitude. For easy understanding, it is assumed that the b-frame coincides with the UAV coordinate system (v-frame).
The installation relationship between the camera and IMU is depicted in Figure 6. The camera position derived from the IMU and camera is expressed as follows,  wherer e c is the camera position calculated by the IMU-derived position and lever arm; r e c is the camera position estimated by marker detection; l b c is the lever arm vector from the IMU center to the camera center in the b-frame; n rc is the position error of visual estimation; and []× represents the antisymmetric matrix of a vector.
where Since the landing maker can provide a reliable heading reference, it can also be exploited to correct INS errors as an aiding source. The camera attitude derived from the IMU and camera are expressed as follow, T n n c c where ˆn where, ρ is the third element of ρ and n z can be written as Since the landing maker can provide a reliable heading reference, it can also be exploited to correct INS errors as an aiding source. The camera attitude derived from the IMU and camera are expressed whereĈ n c and C n c are the camera attitude matrices derived from the INS and recovered from the visual solution, respectively; C b c is the rotation matrix from the c-frame to the b-frame; v is the heading error of the visual estimation; and ρ is the attitude difference between the vehicle attitude calculated by INS and vision. Hence, the observation vector can be expressed as a combination of the position error observation and heading error observation as follows: where, ρ heading is the third element of ρ and z n rc can be written as z n rc = C n e r e c − r e c = C n e δr e IMU + C e n C n b l b c × φ − C e n n rc = δr n + (

Experiments Description
To demonstrate the performance of the proposed system, we carried out field tests and analyzed the results from two aspects: the retrieved pose errors and reprojection errors of the corner points. The GNSS and high-precision fiber-optic IMU (POS320, MAP Space Time Navigation Technology Co., LTD, Wuhan, China) integrated navigation systems were used as references for the pose. The camera and consumer grade MEMS IMU (ICM20602)-integrated navigation system was the system under test. The camera used in the experiments was a global shutter camera with a 644 × 484 resolution (Mako G-030C, Allied Vision, Stadtroda, Germany). The IMU data rate was 50 fps, and the image frame rate was 20 fps. Figure 7 shows the experimental equipment platform. The spatial displacement of the IMU and the camera was calibrated in advance [41]. Table 1 provides the technical parameters of the two IMUs. The position and heading of the marker in the e-frame were previously obtained by GNSS post processed kinematic (PPK) static measurements. The size of the marker used in the tests was 29 cm × 29 cm.    Without the expensive motion capture system, it is too difficult, if not impossible, to obtain a reliable vehicle pose as ground truth with light load. A machine with four stepper motors, allowing precise movement of the camera in 3D space, was designed and built in [29] to evaluate the estimated position of the camera w.r.t. the maker. Although the high-precision GNSS/INS integrated systems can provide six-DOF pose references, they are usually too heavy (more than 10 kg) to be mounted on the UAV (like the one used in our experiments). Therefore, we hold the experimental device with reference system by hand to mimic the motion and trajectory of the UAV when landing to evaluate the pose accuracy convincingly. The GNSS/INS pose reference system was fixed on a rigid platform with the camera and MEMS IMU. The position and angle misalignment between the two IMUs was calibrated previously to project the ground truth to the MEMS IMU center. The position of the antenna phase center relative to the reference system was also measured in advance. The camera, IMU, and GNSS data were collected for postprocessing. We performed post processed kinematic (PPK) for the GNSS data process, then the smoothing results of the GNSS/INS integrated navigation were calculated  Figures 8 and 9. The experimental environment is shown in Figure 10.

Results Discussion
The position and attitude errors of the results by the pure visual solution in the third test are shown in Figures 11 and 12, respectively.

Results Discussion
The position and attitude errors of the results by the pure visual solution in the third test are shown in Figures 11 and 12, respectively.

Results Discussion
The position and attitude errors of the results by the pure visual solution in the third test are shown in Figures 11 and 12, respectively.

Results Discussion
The position and attitude errors of the results by the pure visual solution in the third test are shown in Figures 11 and 12, respectively.     The tables show that the positioning error of the visual solution does not exceed 0.01 m in any of the three directions. It can be observed that the positioning results on horizontal direction are less accurate and more unstable than that on vertical direction. Given that the available satellite number was beyond 20 and the carrier-phase ambiguities were resolved to their integer values in our experiments, the accuracy of PPK smoothing positioning could be thought to be at the centimeter level [42]. Meanwhile, it is obvious that the noise influences of visual positioning and PPK positioning are independent of each other, hence the absolute positioning accuracy of the proposed visual solution is equivalent to that of the smoothed PPK.
For the attitude errors, the root mean square error (RMSE) of the roll angle was 0.45 • (1σ), the pitch angle error was 0.43 • (1σ), and the heading angle error was 0.1 • (1σ). Since the corners of the marker are all on the horizontal plane (almost parallel to the p-frame), we can calculate the heading angle very accurately based on the position of the corresponding points on the image, but the roll and pitch estimations are less stable because of the lack of observations in the vertical direction.
The statistical results of the position and attitude errors of the visual/inertial fusion solution in five experiments are shown in Tables 4 and 5, respectively. The statistical results show that the positioning errors of the visual/inertial integrated solution are not more than 0.01 m in any of the three directions, which is consistent with the vision solution.
In addition, the reprojection errors was calculated to prove the pose accuracy. The corner points on the image that are not used for estimation (rejected by the RANSAC) were projected into the w-frame by the estimated pose, and the errors comparing to the real 3D corners on the horizontal direction were calculated. The results of the first test are shown in Figure 13, and the statistical results of the five experiments are shown in Table 6. The statistical results show that the positioning errors of the visual/inertial integrated solution are not more than 0.01 m in any of the three directions, which is consistent with the vision solution.
In addition, the reprojection errors was calculated to prove the pose accuracy. The corner points on the image that are not used for estimation (rejected by the RANSAC) were projected into the wframe by the estimated pose, and the errors comparing to the real 3D corners on the horizontal direction were calculated. The results of the first test are shown in Figure 13, and the statistical results of the five experiments are shown in Table 6.   The statistical results show that the corner reprojection error is 0.022 m (1σ ), which also verifies that the position accuracy of the proposed system is at the centimeter level.
For the attitude errors, it can be observed from the statistical results in Table 5 that the roll angle error is 0.30° (1σ ), the pitch angle error is 0.20° (1σ ), and the heading angle error is 0.07° (1σ ). Comparing to the pure visual solution, the roll and pitch angle accuracies have increased by 33% and 54% on average, respectively. Additionally, Figure 14 shows that the pitch angle and roll angle estimated by the integrated navigation are more stable and smooth than those of the visual solution. Due to the excellent sensibility to vehicle motion of the IMU, the horizontal angle vibration caused by the maneuver is alleviated by the fusion with INS.  The statistical results show that the corner reprojection error is 0.022 m (1σ), which also verifies that the position accuracy of the proposed system is at the centimeter level.
For the attitude errors, it can be observed from the statistical results in Table 5 that the roll angle error is 0.30 • (1σ), the pitch angle error is 0.20 • (1σ), and the heading angle error is 0.07 • (1σ). Comparing to the pure visual solution, the roll and pitch angle accuracies have increased by 33% and 54% on average, respectively. Additionally, Figure 14 shows that the pitch angle and roll angle estimated by the integrated navigation are more stable and smooth than those of the visual solution. Due to the excellent sensibility to vehicle motion of the IMU, the horizontal angle vibration caused by the maneuver is alleviated by the fusion with INS.
The attitude and heading reference systems (AHRS) [43,44] is widely used for continuous attitude estimation for flight control of UAVs since it is independent of GNSS. It exploits the IMU for pitch and roll angle estimation and magnetometer augmentation for improving heading stability. Because there was no magnetometer in our test device, we used the IMU data for AHRS and compared the accuracy and stability of the horizontal angles. The Mahony AHRS algorithm [45] was performed in the test. Figure 15 illustrates the roll and pitch errors of the two systems in the fourth test.
It can be observed that the roll and pitch angle estimation of the proposed system is much more accurate and stable than that of the AHRS. The AHRS provides pitch and roll angles relative to the earth gravity vector, thus, the accuracy of horizontal angles suffers from the vehicle acceleration since it cannot be separated by IMU only. In the tests, we held the platform to move in circles, therefore, the direction-alternating acceleration resulted in the sinusoidal curve of the angles error of the AHRS. As for the horizontal angles estimated by the proposed system, the error of INS-derived position was related to the attitude error, hence the difference between the high-precision position obtained by marker detection and that of INS could correct the error of roll and pitch angles by the filter. The attitude and heading reference systems (AHRS) [43,44] is widely used for continuous attitude estimation for flight control of UAVs since it is independent of GNSS. It exploits the IMU for pitch and roll angle estimation and magnetometer augmentation for improving heading stability. Because there was no magnetometer in our test device, we used the IMU data for AHRS and compared the accuracy and stability of the horizontal angles. The Mahony AHRS algorithm [45] was performed in the test. Figure 15 illustrates the roll and pitch errors of the two systems in the fourth test. It can be observed that the roll and pitch angle estimation of the proposed system is much more accurate and stable than that of the AHRS. The AHRS provides pitch and roll angles relative to the earth gravity vector, thus, the accuracy of horizontal angles suffers from the vehicle acceleration since it cannot be separated by IMU only. In the tests, we held the platform to move in circles, therefore, the direction-alternating acceleration resulted in the sinusoidal curve of the angles error of the AHRS. As for the horizontal angles estimated by the proposed system, the error of INS-derived position was related to the attitude error, hence the difference between the high-precision position obtained by marker detection and that of INS could correct the error of roll and pitch angles by the filter.
As aforementioned, INS can provide continuous positioning results when the external aiding The attitude and heading reference systems (AHRS) [43,44] is widely used for continuous attitude estimation for flight control of UAVs since it is independent of GNSS. It exploits the IMU for pitch and roll angle estimation and magnetometer augmentation for improving heading stability. Because there was no magnetometer in our test device, we used the IMU data for AHRS and compared the accuracy and stability of the horizontal angles. The Mahony AHRS algorithm [45] was performed in the test. Figure 15 illustrates the roll and pitch errors of the two systems in the fourth test. It can be observed that the roll and pitch angle estimation of the proposed system is much more accurate and stable than that of the AHRS. The AHRS provides pitch and roll angles relative to the earth gravity vector, thus, the accuracy of horizontal angles suffers from the vehicle acceleration since it cannot be separated by IMU only. In the tests, we held the platform to move in circles, therefore, the direction-alternating acceleration resulted in the sinusoidal curve of the angles error of the AHRS. As for the horizontal angles estimated by the proposed system, the error of INS-derived position was related to the attitude error, hence the difference between the high-precision position obtained by marker detection and that of INS could correct the error of roll and pitch angles by the filter.
As aforementioned, INS can provide continuous positioning results when the external aiding source is short-term unavailable. To illustrate that our system is robust to failure in marker extraction, As aforementioned, INS can provide continuous positioning results when the external aiding source is short-term unavailable. To illustrate that our system is robust to failure in marker extraction, we artificially interrupted the visual correction in data processing and calculated the errors of the positioning results. We set five outages for each test, and each outage was set to 5 s. Figures 16 and 17 are the position and reprojection errors of the simulation experiments in the fifth test, respectively.
The position error in the horizontal and vertical direction for the 25 outages in all the tests is 0.41 and 0.09 m (1σ), respectively, indicating that the proposed system can provide stable and reliable pose estimation when the marker is invalid occasionally. However, the navigation error of INS is accumulated with time as shown in Figure 17, hence long-term failure of marker detection will decay the performance.
To verify the robustness of the proposed system when the UAV is at a relative higher altitude or large-angle maneuver, we held the test platform 8 m away to capture the marker with no slant and with 40 degrees tilting, respectively. As the distance increases, the marker in the image becomes smaller and the field of view becomes larger, hence objects of different shapes around the marker came into view which caused the debris in the figures. Only the polygon containing correct grayscale information would be matched. Figure 18 presents the contours extraction and marker detection results in the two situations. The positioning errors (RMS) on vertical direction for these two tests are 0.12 and 0.23 m, respectively. The errors on horizontal direction are 0.21 and 0.30 m, respectively. It can be observed that the long distance and large tilt angle did not affect the marker detection. However, the positioning error increased since the marker was small on the image and the geometric condition of corner points was poor. Because the precision landing of the UAV depends primarily on the pose accuracy when it is close to the landing pad, this positioning results at such a distance are enough for the UAV to approach the target. we artificially interrupted the visual correction in data processing and calculated the errors of the positioning results. We set five outages for each test, and each outage was set to 5 s. Figures 16 and 17 are the position and reprojection errors of the simulation experiments in the fifth test, respectively.  The position error in the horizontal and vertical direction for the 25 outages in all the tests is 0.41 and 0.09 m (1σ ), respectively, indicating that the proposed system can provide stable and reliable pose estimation when the marker is invalid occasionally. However, the navigation error of INS is accumulated with time as shown in Figure 17, hence long-term failure of marker detection will decay the performance.
To verify the robustness of the proposed system when the UAV is at a relative higher altitude or large-angle maneuver, we held the test platform 8 m away to capture the marker with no slant and with 40 degrees tilting, respectively. As the distance increases, the marker in the image becomes smaller and the field of view becomes larger, hence objects of different shapes around the marker came into view which caused the debris in the figures. Only the polygon containing correct grayscale information would be matched. Figure 18 presents the contours extraction and marker detection results in the two situations. The positioning errors (RMS) on vertical direction for these two tests are 0.12 and 0.23 m, respectively. The errors on horizontal direction are 0.21 and 0.30 m, respectively. It can be observed that the long distance and large tilt angle did not affect the marker detection. However, the positioning error increased since the marker was small on the image and the geometric condition of corner points was poor. Because the precision landing of the UAV depends primarily on the pose accuracy when it is close to the landing pad, this positioning results at such a distance are enough for the UAV to approach the target. we artificially interrupted the visual correction in data processing and calculated the errors of the positioning results. We set five outages for each test, and each outage was set to 5 s. Figures 16 and 17 are the position and reprojection errors of the simulation experiments in the fifth test, respectively.  The position error in the horizontal and vertical direction for the 25 outages in all the tests is 0.41 and 0.09 m (1σ ), respectively, indicating that the proposed system can provide stable and reliable pose estimation when the marker is invalid occasionally. However, the navigation error of INS is accumulated with time as shown in Figure 17, hence long-term failure of marker detection will decay the performance.
To verify the robustness of the proposed system when the UAV is at a relative higher altitude or large-angle maneuver, we held the test platform 8 m away to capture the marker with no slant and with 40 degrees tilting, respectively. As the distance increases, the marker in the image becomes smaller and the field of view becomes larger, hence objects of different shapes around the marker came into view which caused the debris in the figures. Only the polygon containing correct grayscale information would be matched. Figure 18 presents the contours extraction and marker detection results in the two situations. The positioning errors (RMS) on vertical direction for these two tests are 0.12 and 0.23 m, respectively. The errors on horizontal direction are 0.21 and 0.30 m, respectively. It can be observed that the long distance and large tilt angle did not affect the marker detection. However, the positioning error increased since the marker was small on the image and the geometric condition of corner points was poor. Because the precision landing of the UAV depends primarily on the pose accuracy when it is close to the landing pad, this positioning results at such a distance are enough for the UAV to approach the target. Finally, we record the time spent on each module of the system over 1000 trials on the Tegra K1 developer kit (NVIDIA, Santa Clara, CA, USA), including a 4-Plus-1 ARM Cortex-A15 CPU. The results are shown in Table 7. Finally, we record the time spent on each module of the system over 1000 trials on the Tegra K1 developer kit (NVIDIA, Santa Clara, CA, USA), including a 4-Plus-1 ARM Cortex-A15 CPU. The results are shown in Table 7. It can be observed form the time consumption in Table 7 that the system runs efficiently. The most time-consuming part is marker detection, which requires a large number of pixel operations, while the other parts take relatively shorter times since they only need to process a small amount of low-dimension data. Overall, the average time consumption of the entire system is 32.26 ms, which is able to meet the real-time control requirements of UAVs.

Conclusions
In this paper, an artificial marker and MEMS IMU-based six-DoF pose estimation solution to meet the UAV landing requirements is proposed. The marker is specially designed for UAV landing due to its simplicity to generate and effectiveness to detect. An onboard monocular camera is used to recognize the marker and recover the absolute position and attitude of the UAV in the world frame. Then, this information is fused with IMU data by an EKF, thus our system still works in the case of visual failure, which is a fatal problem for previous systems. Improving upon the previous visual/inertial landing systems, the error components of the inertial sensor are modeled, and the horizontal angles are also estimated. Experiments show that the position accuracy is at centimeter level, the heading error is less than 0.1 • , and the roll and pitch angle errors are less than 1 • with the UAV at a height of about 1.4 m. The visual/inertial fused system improves the roll and pitch accuracy by 33% and 54%, respectively. Moreover, vision outage simulation tests show our system can provide relatively reliable position estimation (0.41 m drift in the horizontal direction and 0.09 m in the vertical direction in 5 s) when the maker occasionally occluded. Tests in challenging situations prove that the marker can be detected at a distance of 8 m and tilt of 40 degrees and the positioning accuracy is at decimeter level. The statistics of time consumption of the proposed system in an onboard computer show that it is able to meet the requirement of real-time control. The proposed system also applies to marker-based state estimation of indoor ground vehicles.
Future works will extend pose estimation to the entire route of UAV flight. Natural features are going to be used to estimate the UAV pose, when there is no marker in the field of vision. The marker can be used to correct the accumulated navigation error as long as it is detected.