A single camera 360 ‐ degree real time vision ‐ based localization method with application to mobile robot trajectory tracking

A method is proposed for real ‐ time vision ‐ based localization in the 360° area around a three ‐ dimensional (3D) reference object with a single camera. The problem is split into three subproblems. First, 360° 3D object recognition is proposed, in which a computer vision solution can recognize a reference object from all possible 360° locations. Second, 360° pose estimation is presented, in which the pose of a robot at all locations is estimated. Third, a 360° localization application is integrated with a closed ‐ loop real ‐ time trajectory tracking controller. The successful results of simulations and real experiments of trajectory tracking are also presented.


| INTRODUCTION
Vision is an important sensor that can be used to perceive the environment. Cameras are cheap and provide information about the shape, size and texture of an object. They are needed by the vision-based control unit of a mobile robot. There are many studies on visual robot control or visual servoing. Visual servoing has been introduced to control the trajectory of a robot with respect to a target object using feedback obtained by a vision system [1]. The objective of closed-loop visual servoing is to minimize the difference between the actual pose and the desired pose using feedback. Existing approaches can be classified into two main categories: (1) position-based visual servoing, in which feedback is defined in terms of three-dimensional (3D) Cartesian information derived from the image(s), and (2) image-based visual servoing, in which where feedback is defined directly in the image in terms of image features [1]. Before one can guide a mobile robot to the desired path, the self-localization problem must be solved. In this research, a robot uses an external object as its reference point to determine its pose and location. This research proposes an alternative vision-based localization with a single camera. Our research team developed a pose estimation method called Mirage. Using a desired pose and the current image of a 3D object, Mirage provides the estimated 6-degreesof-freedom (6DOF) pose of the robot with minimum errors in real time. Mirage has been studied for a number of problems including robot trajectory tracking control [2], pose estimation with one camera [3], two cameras [4], and pose recovery [5]. We propose solutions to eliminate the 3D object recognition problem caused by the relative location of the robot with respect to the object, such as when the robot moves behind the reference 3D object. The outcome of the simulations and real experiments demonstrate small pose estimation errors in all 360°coverage around the 3D reference object.
The work focuses on a landmark-based system in which an autonomous system can determine its pose or follow a trajectory based on a landmark. In a landmark-based system, autonomous systems move, drive, or fly with respect to a known landmark. Hence, the landmark should always be detected by these systems to be able to function normally. For example, consider a case in which an unmanned aerial system (UAS) needs to gather information for a police station in a police car. The UAS may use a landmark on the police car or even the police car itself as a landmark. It may fly on a desired trajectory with a given pose, collect the information, and fly back using the landmark. Images of the landmark taken on the wireless camera on the UAS could be sent to a ground station for real-time processing of the UAS's pose. Then, the UAS's trajectory tracking controller can estimate the pose at which to fly the UAS accurately on the desired path.
Our contributions can be summarized thus: • Existing methods assumed that the same side of the landmark object is always in view of the camera. The authors eliminate that assumption and propose a method for recognising the landmark object from all sides, leading to the ability of robot localization irrespective of the location of the robot around the landmark object.
• The authors propose an efficient method for recognising what side of the object is in view. This method is simpler to implement and is computationally efficient compared with existing methods.
The rest of this work presents the material as follows. Section 2 presents a review of studies in vision-based camera localization. Section 3 explains details of the 360°vision-based localization method with a single camera system. In Sections 4 and 5, the results of both simulations and experiments are presented. Finally, the last section concludes the work.

| RELATED WORK
Vision-based camera localization has two categories related to environments: one with unknown environments and another with known environments. Methods for unknown environments include online and real-time environment mapping called simultaneous localization and mapping (SLAM) [6] and the methods without online and real-time environment mapping, called structure from motion (SFM) [7].
SLAM focuses on visual sensors called Visual-SLAM. Visual-SLAM uses images as external information to estimate the location of the robot and constructs the environment map [8]. Visual-SLAM has gradually become the preferred choice for lunar rover positioning [8]. Visual-SLAM, which uses a single camera to localize a mobile robot, is a challenging problem because the image from the monocular camera lacks depth information. Therefore, multisensor approaches were introduced, which improved robustness and gave a higher accuracy compared with that of inertial measurement units (IMUs) and global positioning systems (GPSs). Piao and Kim [9] proposed the method of combining a monocular camera and IMU sensor in mobile devices to implement a more accurate and stable SLAM system. Shi et al. [10] presented a framework for the GPS to support a visual-SLAM with bundle adjustment using a rigorous sensor model in a panoramic camera. Camera technologies such as infrared and deep cameras were introduced to contribute to the pose estimation approaches. Su et al. [11] and Xie et al. [12] proposed a system for unmanned aerial vehicle (UAV) 6DOF pose estimation. Su et al. [11] used an infrared active marker and a colored passive marker to mark the relative pose of the target. With two different types of sensors, a particle swarm optimization (PSO) algorithm was employed to solve the optimal pose of a UAV. Xie et al. [12] also used a PSO for the correspondence search between landmarks and image detections, and then applied the Kalman filter to track and predict pose of the UAV in space.
For SFM approaches, Irschara et al. [7] presented a fast location recognition technique based on structure from motion point clouds. Burschka and Mair [13] proposed an SFM method to calculate 6DOF pose estimation directly from one camera with two images. Camposeco et al. [14] proposed the pose estimation of calibrated pinhole and generalized cameras with respect to an SFM model by leveraging both 2D-3D correspondences as well as 2D-2D correspondences. In stages of the SFM development, there were more works on global pose solving. In addition, Li et al. [15] estimated a full 6DOF camera pose with respect to a large geo-registered 3D point cloud. It combined research on image localization, landmark recognition, and 3D pose estimation. Park et al. [16] estimated the camera direction of a geotagged image using Google Street View and Google Earth satellite images. Cui and Tan [17] presented a global method for camera pose registration.
Camera localization with a known environment or 3D space points is called a perspective-n-point (PnP) problem. The goal of a PnP problem is to find 6DOF pose parameters. There are no solutions for the PnP problem when n = 1, 2 because in that case, the number of constraints is less than the number of parameters. Many solutions have been proposed to solve the PnP pose estimation problem when n ≥ 3. The minimal case is the P3P [18] problem, which uses only three matching points to solve for unknown pose parameters. However, this may yield two to four solutions. All solutions are not viable at the same time, which causes ambiguity [19]. This ambiguity could be solved by using more feature points. For example, Guo et al. [20] proposed P4P, whereas Tang et al. [21] used P5P to find the pose parameters. Theoretically, the pose estimation should be solved by using a small number of feature points. However, Zheng et al. [22] suggested that using many feature points reduced the side effects of noise issues.
Our research team proposed Mirage-S (Mirage for single camera) [3]. Mirage-S works well using eight feature points and has superior performance compared with eight well-known PnP methods [3]. Mirage-S yields the minimum average error less than 6 cm in translation and the minimum average rotational error less than 2°.

| METHOD
In this section, we present a 360°pose estimation method with a single camera. The goal is to find the actual pose anywhere around a 360°area. There are three approaches to achieving this goal.

| Three hundred sixty-degree threedimensional object recognition
The first stage is to recognize the side of the reference 3D object faced by the camera from any point of view. The vision system needs a fixed 3D reference object in the real world. A geometry of two octagonal prisms, in which the small one is on top of the big one, is defined as the 3D reference object, shown in Figure 1. This 3D object is an inertial coordinate system to calculate the actual pose of the mobile robot. It has eight sides; each side has one face. Each face has a unique pattern of eight circles, as shown in Figure 2. The centers of these circles are used as feature points because Mirage-S yields good results using at least eight feature points. To recognize the 3D object, the system must match the feature points in the 3D model to the feature points on the image captured by the camera. The two main processes in this stage that are presented are feature extraction (explained in Section 3.1.1) and feature matching and correspondence (covered in Section 3.1.2).

| Feature extraction using MATLAB image processing toolbox
This stage is composed of detecting feature points by finding circles on the 3D object and then determining the face(s) viewed by the camera using the patterns of feature points. Our goal in this stage is to recognize the face(s) of the 3D reference object in the actual images captured by the robot in real time.
Detecting Feature Points. To find the feature points, we must first detect circles on the image. After capturing a greyscale image, Otsu's method [23] is used to find a singleintensity threshold. This threshold was used to separate pixels into two classes (foreground and background) to generate a binary image. Based on our 3D model, we defined the size of the circle area between 150 pixels and 440 pixels assuming the distance to the 3D object to be 4-7 ft. By using the regionprops function in the MATLAB image processing toolbox [24], a set of regions of interest (ROIs) or blobs from the binary image is obtained to determine candidate circles of the object. In the ROIs, we determined all possible circles in the image by calculating the simple circularity of blob(c) [25] as in Equation (1): where area and perimeter respectively represent the area and perimeter of the blob. The circularity of a circle (in the reference object) is 1 or close to 1. For our empirical experimental results, c should be greater than 0.75. Then, we eliminated all possible circles that are outliers. The center points of detected circles are used as feature points. Identifying Face(s). Because each face of the 3D reference object has eight feature points, viewing one face of the object should be enough for Mirage-S, because it requires at least eight feature points. However, if the number of feature points detected is less than eight, the system should ignore that image and retake a new image. It is also likely that the neighbouring faces might be visible in the camera owing to the object's octagonal prism shape, leading to 24 feature points detectable by the system. Labeling of the circles is important for clustering of feature points on the same face. We used MATLAB function bwlabel to label all circles. This function searches and labels all circles from top to bottom and left to right. In case the mobile robot trajectory tracking does not support this function because of its complexity, a similar, simple labeling method is used to replace this function. Because each pattern is composed of four feature points on the faces of the object, the detected feature points are clustered into groups of four feature points. If the number of features is a multiple of 4, such as 8, 12, 16, 20, or 24, the system could straightforwardly cluster them into two, three, four, five, or six groups, respectively, by having four features in each group. However, there are some cases in which the camera might see a face partially and not all feature points on that face are detected, leading to a number of feature points that are not multiples of 4 (e.g., 9,10,11,13,15). In such a case, the highest multiple of 4 less than the number of detected feature points is used for analysis. For example, if the number of feature points is 13, the system will use only the first 12 points and cluster them into three groups. For each group, the system should seek the closest point for each point and draw the centerline through each point. The centerlines of all four points form the pattern that is required in Section 3.1.2.
3.1.2 | Feature matching and correspondence problem for recognizing the face of the object The patterns of feature points are determined by the angles between feature points. Hence, we applied angle matching to recognize the face of the object. Figure 3 illustrates two vectors that are connected among three points. First, we compute the angle in degrees (a) between two vectors, as in Equation (3): Four points are connected by three lines and there are two angles among these three lines.
We maintain a ground-truth database of angles for patterns on the faces of the object. Because each face has patterns on the upper and lower prism of the object, there are two groups of angles for each face (one for each prism). To solve the feature matching and correspondence problem, we compared the angles from the actual image with the angles in the database. We calculated the sum of absolute error between them and compared it with a defined threshold. The threshold is set to five per group in our experiments. If the error is less than or equal to the threshold, it is an indicator of the correct side or pattern. As mentioned earlier, the camera may view one, two, or three faces simultaneously. In the case of three sides, possible face trios are F8-F1-F2, F1-F2-F3, F2-F3-F4, …, F7-F8-F1, where Fn indicates the nth face. In the case of two sides, possible face pairs are F1-F2, F2-F3, F3-F4, …, F8-F1. In the case of one side, target patterns are F1, F2, F3, F4, …, F8. Each case has its own angle database, in which each face has two groups (for each prism) and each group has two angles. The angle database is provided in the Appendix.
The number of circles is used as an indicator of the number of faces seen by the camera. Registering 24, 16, and eight circles are possible indicators of the visibility of three faces, two faces and one face, respectively. Hence, if the number of circles is eight, 16, or 24, we apply feature matching with one, two, or three sides, respectively. In other cases, at least one face is detected partially. If the number of circles is 12, the system first checks with the case with two sides. If it does not match, the first 8 points are selected for feature matching for the case with one side. If the number of circles is 20, the system first checks for feature matching with the case with three sides. Twenty points may not match exactly to three neighboring faces. In such a case, the first 16 points are selected and feature matching is applied for case with two sides. To minimize the processing time in the mobile robot trajectory tracking, we eliminate the number of circles that are 12 and 20.

| Experimental results
Figures 4 and 5 show some results from stage 1. Our algorithm works well under various conditions. We detect all circles in the actual images in Figure 4. Our method works even with a blurred image as well. We also tested our algorithm in a variety of cases that could happen in real situations. Figure 5 shows the incomplete outcomes. Sometimes our method failed to extract all of the features or made an incorrect pattern. However, our method can still recognize the face despite the noise or other incomplete faces. These results show that our method yields reliable feature matching from different angles. As an additional experiment in stage 1, we moved the robot F I G U R E 3 Angle measurement F I G U R E 4 Completed feature detection image around the 3D reference object in a 5-ft radius manually and recorded the videos with a frame rate of 10 frames/s as the robot was moving. This experiment was set up to observe realtime image processing in the vision system with continuous frames from nine videos. We compared our results with the SqueezeNet-V1.1 [26] and ResNet-50 [27] convolutional neural networks. In the case of deep learning methods, we randomly classified 75% into training and 25% into testing datasets. Table 1 shows the comparison of the percentage of accuracy rate among SqueezeNet-V1.1, ResNet-50 and our proposed method. Their average percentages of accuracy rate are 91.38%, 95.52% and 96.96%, respectively. These results indicate that our 3D object recognition method and the deep learning methods yield high accuracy rates.

| Three hundred sixty-degree mobile robot pose estimation
The second stage is pose estimation with respect to the reference object. In this stage, we need to determine how the robot is positioned and oriented in the inertial Euclidean space. We propose calculating the pose of the robot using 2D images of reference points on the 3D reference object. In the literature, terms such as PnP or localization problem have been used to describe this problem. The 6DOFs that describe the pose of the mobile robot in 3D space are q = [x, y, z, roll, pitch, yaw]. We propose to calculating the robot pose using Mirage-S pose estimation [3]. We investigated the performance of our method with a real experiment in the laboratory using a high-precision Vicon motion capture system. Vicon is a key player in optoelectronic motion capture systems based on markers [28]. Owing to its ability to provide precise 3D positions and orientations, we used the Vicon results as ground pose truth. We set the mobile robot in 64 different locations around the reference object. We used our proposed 360°vision-based pose estimation method to calculate the actual pose of the robot at the same 64 locations as shown in Figure 6 Then, we calculated the pose error by comparing the Vicon's ground truth poses with the outcomes of our pose estimation method. The observation showed that the standard deviation of translational error along the z axis is greater than that of the x and y axes; therefore, translational error along z axis is slightly more distributed compared with that of the x and y axes, as shown in Figure 7(a)-(c). Table 3 indicates that the average rotational errors around the x, y, and z axes show small errors of about 0.7°to 2°, whereas error around the x axis is greater than that of the y and z axes. When we consider the standard deviation of rotational errors, the y axis yields a greater value than the x and z axes. As a result, rotational error around the yaxis shows more distribution than that of the x and z axes, shown in Figure 8(a)-(c).

| Mobile robot 360°vision-based trajectory tracking control
We demonstrated the real-time capabilities of the proposed method and measured the performance of our method by controlling a mobile robot on a commanded desired path using a closed loop feedback controller. We defined a desired path of the mobile robot as a complete circular motion around the 3D reference object with a constant speed. By integrating the closed loop feedback controller with the proposed visionbased pose estimation, the mobile robot automatically drives on the desired path around the 3D object with constant speed in real time without human control. The vision data are acquired from a camera mounted directly on the mobile robot (eye-in-hand). Visual features are extracted from the image and used by Mirage-S to calculate the actual pose. The closed loop controller uses the error between the desired pose and the actual pose to command the wheel speeds that correct the error. The control law is defined in terms of the 3D Euclidean space. The performance of the controller was studied using where R is the circular path's radius, t is the given time, _ α is the constant angular rate of rotation, and α ¼ _ αt. We also calculate the desired velocity vector ( u

as seen
in Equation (5):  (6): The mobile robot moves to pose q ! and the camera will take image p as the feedback control signal. Then, the feature extraction unit will detect the feature points from the image. The 2D image space of feature points s is sent to the Mirage module to estimate the actual 3D pose q ! m for the next motion.

| Two-wheeled nonholonomic mobile robot
A two-wheeled nonholonomic mobile robot is depicted in Figure 10, where (x, y) is the wheel axis center position and θ is the robot's orientation. We focus on the kinematic model, which does not consider the mass or force that caused the motion. The kinematic motion equations of a mobile robot are equivalent to a unicycle model. The robots have a nonholonomic constraints, as in the equation: where _ xðtÞ ¼ dðxÞ dt and _ yðtÞ ¼ dðyÞ dt . According to this assumption, the robot cannot move in the lateral direction [29]. In our experiments, a mobile robot moves along a desired trajectory using a single camera mounted on the robot body. Each time the robot verifies its actual pose with respect to the desired pose, it moves to reduce the pose error between the actual pose and desired pose.

| Error model of mobile robot
The posture error model from our experiment is given in the global coordinate system. Global posture error should be transformed to the local coordinate system as in Equation (8) the error in the lateral direction, and e θ gives the error in the orientation [29]: As illustrated in Figure 9, the desired pose q ! d , actual pose q ! m , and desired velocity u the position-based control unit to compute the actual velocity vector u ! . Kannayama et al. [30] presented the control u ! to solve the trajectory tracking problem: � is the feedback signal to control the mobile robot on the desired path. Blazic [29] proposed the feedback signal control law: where e s = sin e θ , e c = cos e θ − 1, a > 2, k > 0, k x and k s are positive functions, whereas n ∈ Z. For practical cases, n is a small number (usually − 2, −1, 0, 1 or 2 are good choices) [29].

| SIMULATIONS
We simulated closed loop control by moving the robot around the reference 3D object with a radius of 5.7 ft (174 cm), 9.8 ft (300 cm) and 19.7 ft (600 cm). Our simulations were implemented in MATLAB, running on MATLAB 2018a on a desktop computer (processor: AMD A4-6300B; memory: 16 GB; hard drive: 930 GB). We developed programs that follow the diagram in Figure 9. The 3DOF desired pose q ! d = [x, y, θ] T is defined by the function of time as shown in Equations (11)-(13): where R = 174 cm (5.7 ft), 300 cm ( which the mobile robot's starting position was set outside the initial desired position. We altered the x coordinate by −12, −40, and −85 cm in respect to the simulations, where R = 174, 300, and 600 cm. The images are generated by using actual poses, 64 feature points on a 3D model, and intrinsic and extrinsic camera parameters. The Mirage uses the desired pose q ! d and a 2D feature points image to calculate the actual pose q ! m . The trajectory of the mobile robot results from the simulations indicating accurate tracking for R = 174, 300 and 600 cm, as presented in Figure 11(a)-(c), respectively. In addition, we generated a visualization of the simulations, such as the robot moving on the circular path, and 3D reference model in a camera view to verify the proposed method in the simulations.

| EXPERIMENTS
After verifying our method through the simulations, we conduct two types of experiments to show the validity of our method. First, we deploy our algorithm on a robot with a Logitech c525 installed Web camera, shown in Figure 12 and evaluate its performance. Second, we compare our method with one of the deep pose learning algorithms, PoseNet. Before performing pose estimation experiments, the intrinsic and extrinsic camera parameters must be computed. The intrinsic and the extrinsic parameters of the camera were found in two separate stages. In the first stage, the camera calibration toolbox for MATLAB [31] was used to estimate intrinsic parameters (f x , f y , o x , o y ), where f x , f y are focal lengths and (o x , o y ) is the principle point. The calibration toolbox requires a 7 � 9 square checkerboard image, as shown in Figure 13(a). An undistorted camera was assumed. In the second stage, extrinsic camera parameters were estimated using a high-precision Vicon motion capture system. Two similar quadrilateral shapes using reflective Vicon markers were placed at the center of the camera and the mobile robot, shown in Figure 13(b). After results were obtained from the lab, the optimal rotation and translation of the camera or extrinsic parameters [x, y, z, θ, ϕ, ψ] relative to the mobile robot were calculated.

| Real-world experiments
First, we modified the simulation code. We replaced the simulated camera and robot model with the actual camera and robot. The modified code was integrated with Simulink Real Time running on MATLAB 2018a with the same desktop computer used to perform the simulations. Second, we established a connection between the host computer (desktop computer) and the embedded target computer (processor: Intel Atom 1.8 GHz; memory: 4 GB; hard drive: 80 GB), which is the computer on the robot, via the Ethernet cable. Third, we complied, built, and transferred the execution code in a binary format to the embedded target computer. Finally, we ran the code on the embedded target computer and obtained results from the Simulink files scopes to process and interpret the data. In addition, the robot's wheel speed control hardware executes the closed loop controller's outputs. To function, the robot speed controller hardware requires the velocity of the left and right wheel. As mentioned in Section 3.3.3, the feedback controller calculated the actual velocity vector ( u where v = velocity (cm/s) and angular velocity ω (rad/s). Therefore, v and ω must be converted to the velocities of the left and right wheels. Equation (14) shows the formulas to calculate velocities (v L and v R ) of the left and right wheels: where T is the distance between the centers of the wheels.
We performed experiments on closed loop control using vision signals. As shown in Figure 12, the mobile robot was commanded to move along the circular path. The starting and ending poses were ½−174; 0; − π 2 � T ; the camera faced the 3D model as the robot headed on the path. The controller gain were selected to be k x = 2.5, k = 0.7, k s = 3.5, a = 3, and n = 2. The robot started at 0 s and began moving along the circular track. The robot continued on the circular path and passed the end point, which is the same as the starting point, at approximately 35 s. We used Simulink to measure the processing time for image acquisition, image processing, Mirage pose estimation, and control blocks; it yielded 5.345, 25.128, 57.982, and 8.266 ms, respectively. The total processing time was estimated at about 100 ms. Therefore, the real-time program was set to run at 10 Hz. The trajectory of the mobile robot result for the experiment is presented in Figure 14. In addition, the controller inputs for the robot's motion are presented in Figure 15. The translational velocity (v) varies with a root-mean-square (RMS) equal to 31.2430 cm/s, as in Figure 15(a). Angular velocity (ω) showed a variation with an RMS of 0.1804 rad/s, as in Figure 15(b). v L and v R in Figure 15(c) showed variations in the RMS of 27.4068 and 35.1082 cm/s, respectively. The experiment's results show that the mobile robot successfully moved on the desired trajectory using the proposed method.
POSIT [32] is a well-known and popular method that approximates the perspective projection with a scaled orthographic projection and finds the rotation matrix and translation vector of the object by repeatedly solving a linear system. DLS [34] estimates pose without iterations by using initiation parameters. DLS employs the camera measurement equation and formulates a non-linear least-squares cost function. LHM [33] uses object space collinearity error and derives an iterative method to compute the orthogonal rotation matrix with a global convergence.
PoseNet is the first learning-based architecture that introduces the idea of regressing the absolute pose with a deep architecture. Kendall et al. [35] suggested learning the localization pipeline in its entirety. Starting from an RGB image, a network can learn to regress the camera pose in an end-to-end supervised manner. PoseNet is a modification of a GooLeNet architecture [36], a deep convolutional network with 22 layers. The regression architecture can be abstracted into three components: (1) an encoder that generates the visual encoding vector (v), (2) a localizer that outputs (produces) the localization features vector (u) and (3) a regressor that regresses pose (p). PoseNet was trained with stochastic gradient descent to minimize the pose error. The experiments were set up in the Vicon lab to train Pos-eNet for our purposes. We moved the robot around the 3D reference object manually within a radius of 4, 5 and 6 ft. While moving the robot, we recorded a video with a camera on the robot. Simultaneously, the pose data are recorded by Vicon at a frame rate of 60 frames/s. The precise 3D positions and orientations from Vicon were used as ground truth data. After that, we extracted all frames from the video and obtained the pose data from Vicon. Because they were recorded simultaneously, the frame or image and pose data match properly. POSIT, LHM, DLS and Mirage-S used the images and the pose data from Vicon as the desired poses to calculate the estimated poses. In the case of PoseNet, we randomly classified 80% of images with pose data into training and 20% of data into testing datasets. The code of the TensorFlow PoseNet framework was provided by Dr. Mukherjee from the University of Alabama in Huntsville. This code is modified from GitHub's website and published by Kent Sommer [37]. There are two subprocesses for PoseNet: (1) Training: We trained the network with the training dataset and the initial weights. The training process was run for 100, 1000, and 10,000 iterations, in which each iteration had a batch size equal to 25. The training process yielded the adjusted weights used for testing. (2) Testing: We tested the network with the testing dataset and the adjusted weights. We compared the estimated pose and ground truth data by computing the absolute value of the errors on translation and rotation. Tables 4, 5, and 6 present the median translational and rotational errors of Pos-eNet, POSIT, LHM, DLS and Mirage-S. In a comparison of the results of the well-known PnP methods and Mirage-S, the outcomes indicate that Mirage-S performed with minimal errors (Table 4).
PoseNet yields the minimum value while running 10,000 iterations for each radius. The median translational error is about 3-5 cm and the median rotational error is around 3°. Pos-eNet also yields consistent errors for different radii. This indicates that PoseNet produces robust results. On the other hand, Mirage-S yields a median translational error of around 5-6 cm, whereas its median rotational error is close to the rotational error    (Tables 5 and 6). When we considered the processing time, PoseNet performs pose estimation at around 5 ms for testing, whereas Mirage-S completes pose estimation in about 100 ms. The major advantage of Mirage-S compared with PoseNet is that errors can be tracked for Mirage-S. Because PoseNet uses a deep learning architecture, it is hard to verify what it learned or did not. PoseNet is recommended to be trained on a computer with a GPU, for performance reasons, whereas Mirage-S does not require a GPU or a high-end computer.
PoseNet works well when the training and testing datasets are from the same source data. Sattler et al. [38] proposed the limitations of a convolutional neural network-based absolute camera pose regression technique using different training and testing datasets. The estimated pose results from PoseNet are significantly noisier and do not lie on the testing poses; instead, they lie on the training poses [38]. In addition, the absolute camera pose regression technique is not guaranteed to generalize from the training data in practical scenarios [38].

| CONCLUSIONS
We proposed a method for real-time vision-based localization in the full 360°area around a 3D reference object. There are three main subproblems. First is the 360°object recognition problem. To solve this issue, we applied a computer vision solution to recognize a reference 3D object from all possible 360°views. The reference 3D object is defined by different faces and patterns in which the robot can see from anywhere around it by using feature extraction and feature matching based on angles from patterns on the object. The experimental results in a variety of cases show that the solution can recognize the unique face(s) despite noise or other incomplete patterns, and it yields a high accuracy rate. Second is the 360°pose estimation problem, in which we use Mirage pose estimation with a single camera to calculate the pose of a robot at all possible 360°locations around the reference 3D object. The results of the experiment with the camera poses indicated small translational and rotational pose errors. Third is the application of 360°localization. We integrated 360°l ocalization with a closed loop real-time trajectory tracking controller. The mobile robot is expected to move on a desired path covering the whole 360°area around the 3D reference object. The simulation results indicated minimum average errors on 3DOF (x axis, y axis, and heading). The results of experiments show that the mobile robot successfully moves along the desired circular track. The simulations and real experiments confirmed the feasibility and performance of the proposed approach.