Mobile Robot Self-Localization System Using Single Webcam Distance Measurement Technology in Indoor Environments

A single-webcam distance measurement technique for indoor robot localization is proposed in this paper. The proposed localization technique uses webcams that are available in an existing surveillance environment. The developed image-based distance measurement system (IBDMS) and parallel lines distance measurement system (PLDMS) have two merits. Firstly, only one webcam is required for estimating the distance. Secondly, the set-up of IBDMS and PLDMS is easy, which only one known-dimension rectangle pattern is needed, i.e., a ground tile. Some common and simple image processing techniques, i.e., background subtraction are used to capture the robot in real time. Thus, for the purposes of indoor robot localization, the proposed method does not need to use expensive high-resolution webcams and complicated pattern recognition methods but just few simple estimating formulas. From the experimental results, the proposed robot localization method is reliable and effective in an indoor environment.


Introduction
Autonomous robots have a wide range of potential applications in security guards, house cleaning and even warfare. Most of them are equipped with position measurement systems (PMSs) for the purpose of precisely locating themselves and navigating in their working fields. Three typical techniques [1] in PMSs are triangulation, scene analysis, and proximity. The triangulation technique uses the geometric properties of triangles to compute object locations. The most well-known technique is the Global Positioning System (GPS). However, GPS, as it is satellite dependent, has an inherent problem of accurately determining the locations of objects within a building [2]. A proximity location-sensing technique entails determining when an object is "near" a known location, and the object's presence can be sensed via some limited range physical phenomenon. Some famous techniques are detecting physical contact [3,4] or monitoring wireless cellular access points [5,6]. The scene analysis location sensing technique uses features of a scene observed from a particular vantage point to draw conclusions about the location of the observer or of objects in the scene. Some well-known techniques are a radar location system [7] or a visual images location system [8].
In an indoor localization technique, the infrared light [6], ultrasonic [9], laser range finder [10,11], RFID [12], and radar [13] are the most popular wireless techniques. Diffuse infrared technology is commonly used to realize indoor locations, but the short-range signal transmission and line-of-sight requirements limit the growth. Ultrasonic localization [9] uses the time-of-flight measurement technique to provide location information. However, the use of ultrasound requires a great deal of infrastructure in order for it to be highly effective and accurate. Laser distance measurement is executed by measuring the time that it takes for a laser light to be reflected off a target and returned back to the sender. Because the laser range finder is a very accurate and quick measurement device, this device is widely used in many applications. In [10,11], Subramanian et al. and Barawid et al. proposed an autonomous vehicle guidance system based on a laser rangefinder. The laser rangefinder was used to acquire environment distance information that can be used to identify and avoid obstacles during navigation. In [14], Thrun et al. provided an autonomous navigation method based on a particle filter algorithm. In this study, the laser rangefinder can receive all the measurement information that it can utilize to compute the likelihood of the particles. These papers confirm that laser rangefinders are high performance and high accuracy measurement equipment. However, their high performance relies on high hardware costs. RFID-based localization uses RF tags and a reader with an antenna to locate objects, but the detection of each tag only can work over approximately 4 to 6 meter distances. To improve the low precision on location positioning, the well-known SpotON [15] technology uses an aggregation algorithm based on radio signal strength analysis for 3D-location sensing. However, a complete system is not available yet. An RF-based RADAR system [7,16,17] uses the 802.11 network adapter to measure signal strengths at multiple base stations positioned to provide overlapping coverage for locating and tracking objects inside buildings. Unfortunately, most cases to date cannot provide overall accuracy of systems as optimal as desired. In indoor localization for robots, most of these wireless techniques are used to perform scans of static obstacles around the robots, and the localization is calculated by matching those scans with a metric map of the environment [18,19], but in dynamic environments the detected static-features are often not enough for estimating a robust localization.
Li et al. [20] proposed a NN-based mobile phone localization technique using Bluetooth connectivity. In this large-scale network, mobile phones equipped with GPS represent beacons, and others could connect to the beacon phones with Bluetooth connectivity. By formulating the Bluetooth network as an optimization problem, a recurrent neural network is developed to distributively find the solutions in real time. However, in general, the sampling rate of Bluetooth is relatively low, and then accurately estimating a moving object in real time is not easy. In [21], a recurrent neural network was proposed to search a desirable solution for a range-free localization of WSNs under the condition that the WSNs can be formed as a class of nonlinear inequalities defined on a graph. Taking advantage of parallel computation of the NN, the proposed approach can effectively solve the WSN localization problem, although the limited transmission bandwidth might cause difficulty in the localization.
Recently, image-based techniques have been preferred over wireless techniques [4,5,9,22]; this is because they are passive sensors and are not easily disturbed by other sensors. In [1], a portable-PC capable of marker detection, image sequence matching, and location recognition was proposed for an indoor navigation task. JongBae et al. used the augmented reality (AR) technique to achieve an average location recognition success rate of 89%, though the extra cost must be considered in this technique. In [23], Cheoket et al. provided a method of localization and navigation in wide indoor areas with a wearable computer for human-beings. Though the set-up cost is lower, this method is not easy to implement and set up if users do not know the basic concept of electronic circuit analysis and design. Furthermore, an imaged-based method for distance measurement was proposed in [24][25][26][27][28][29]. According to the transform equations in those papers, the distance can be calculated from the ratio of the size between the pre-defined reference points and the measured object. In recent years, we have seen growing importance placed on research in two-camera localization systems [30,31]. From two different images, the object distances can be calculated by a triangular relationship. However, to ensure the measuring reliability, the photography angle and the distance between two cameras must be maintained at the same position. Due to the use of two cameras for the measuring device, the set-up costs of the experimental environment will be increased.
Nowadays, surveillance systems exist in most modern buildings, and cameras have been configured around these buildings. In general, one camera covers one specific area. In order to locate an autonomous patrolling robot using existing cameras in buildings, a single-camera localization technique must be developed for the patrolling robots. This study aims to develop a single-webcam distance measurement technique for indoor robot localization with the purposes of saving set-up costs and increasing the accuracy of distance measurements. In our approach, the working area setting can be as simplified as possible, because the existing webcams in the surveillance environment can be utilized without any change. For a single webcam in its working coverage area, we develop an improved image-based distance measurement system (IBDMS) and a parallel lines distance measurement system (PLDMS) to measure the location of a robot according to a known-size rectangle pattern, i.e., a ground tile. This measurement system uses four points, i.e., the four corners of a ground tile, to form a pair of parallel lines in the webcam image. Referring to the pair of parallel lines, we can measure the location of a robot within the visual range of a webcam. Because of the fixed monitoring area of an individual webcam, few simple image processing strategies are used to search for the robots before going through IBDMS and PLDMS. First, we use the low-pass filter and on-line background update method to reduce background noise, and adopt the image morphology to complete prospect information and to remove the slight noise. When the mobile robot is located, IBDMS and PLDMS can obtain the real-world coordinates of a mobile robot. Finally, the localization of a mobile robot can be shown on the two-dimensional map immediately. Thus, for the purpose of indoor robot localization, the proposed method does not need to use complicate pattern recognition methods, but just few simple estimation formulas.

Photography Methods
Before locating a robot by the proposed single-webcam localization technique, the acquired images must go through photograph processing for removing noise and unnecessary information. These techniques include a gray scale, a background subtraction, a morphological image processing, and a connected components labeling technique. Next, we briefly discuss the procedures [32] of these photographic correction techniques used in this paper.

Camera Calibration
Distortion could happen in captured images, especially is cheap webcams are used. To attenuate distortion of the captured images and thus increase the accuracy of the robot location task, the camera calibration should be done before the localization is attempted. OpenCV has taken into account the radial and tangential factors for the image distortion problem. The radial factor can be calculated by the following equations: The tangential distortion can be corrected via the equations as follows: In Equations (1-4) the pixel (x, y)is the image coordinate in the input image and (x corrected , y corrected ) is the image coordinate in the corrected output image. The distortion coefficient vector can be represented as . Moreover, the unit conversion can be represented as: where w is explained by the use of homography coordinate system (and  Figure 1a shows the images before the calibration done for three webcams, and Figure 1b shows the images after the calibration procedure. For the 1st webcam (HD Webcam C310, Logitech, Lausanne, Switzerland) in our experimental environment, the distortion coefficient vector is: and the camera conversion matrix is: For the 2nd webcam (HD Pro Webcam C920, Logitech, Lausanne, Switzerland) in our experimental environment, the distortion coefficient vector is: (8) and the camera conversion matrix is: For the 3rd webcam (HD Webcam PC235, Ronald, Osaka, Japan) in our experimental environment, the distortion coefficient vector is: (10) and the camera conversion matrix is:

Image Segmentation
In a grayscale image, the value of each pixel carries only intensity information. It is known as a black-and-white image, which is composed exclusively of shades of gray. Black is at the weakest intensity and white is at the strongest one. The gray scale technique can change a color image into a black-and-white image. The luminance ( , ) l f x y of the is described as: (12) where R (x,y) , G (x,y) , and B (x,y) are color values at the . Equation (12)

Morphological Image Processing
After the process of image segmentation, discontinuous edges and noise may happen in a foreground image. These will cause wrong judgments during object identification. Therefore, this paper utilizes some morphological image processing operations, such as dilation, erosion, opening and closing, in order to enable the underlying shapes to be identified and optimally reconstruct the image from their noisy precursors.

Connected-Components Labeling
The aim of connected-component labeling is to identify connected-components that share similar pixel intensity values, and then to connect them with each other. The connected-component labeling scans an image and groups pixels into one or more components according to pixel connectivity. Once all groups are determined, each pixel is labeled with a grey level on the basis of the component. According to the aforementioned discussions, we can locate a robot in the captured image in an image-domain. Figure 2 shows the overall schemes of the image processing, and the experimental results are shown in Figure 3. In Figure 4, R center is the center of the robot in the processed image and can be easily calculated by the simple average method. In this paper, R center stands for the center-coordinate of the robot in the image-domain.

Mobile Robot Localization System with Single Webcam
After the captured images go through image processing, we can locate the robot in the image-domain. Then, we should calculate the coordinates of the robot in the image. That is, two distances, the x-axis and the y-axis, should be determined: 1. d i represents the distance between R center and the webcam: 2. i w represents the distance between R center and the wall, as shown in Figure 4. In this paper the IBDMS is used to calculate the distance d i , and the PLDMS is used to calculate the distance w i . Figure 5 shows the map of our experimental environment. In the map, the coordinate of the first webcam is set to 11 ( , ) xy , the second one is set to 22 ( , ) xy, and the third one is set to 33 ( , ) xy .     [24][25][26][27][28][29] IBDMS is developed in this paper for the purpose of calculating the distances ( 1,2,3) i di  , which can work on a single webcam and only depends on a known-dimension rectangle, i.e., a ground tile. The idea of IBDMS is from the triangular relationship, shown in Figure 6, that is we first capture an image incorporating a known-dimension rectangle, and then the proportion relationship between the real-dimension and the image-dimension of the rectangle can be found. According to the proportion relationship, the distance i d can then be easily calculated. Figure 6 shows the IBDMS set-up. It only requires a webcam and two given-location points A and B , which could be two corners of a ground

Calculation of Distance i w with PLDMS
Inheriting the concept of the IBDMS, a parallel-line distance measurement system (PLDMS) for measuring the distances i w is developed. Figure 7 shows the schematic diagram of the PLDMS. In    (20) and: (21) where ( , ) ( 1,2,3,4) ii jj x y j  are, respectively, the image coordinate of the points  Figure 7, is the number of pixels between P and Q, which can be obtained by Equation (21). In Figure 8, points R and S could be two corners of a known-dimension rectangle, i.e., a ground tile, in a captured image (image-domain), and P and Q are the cross points between the scan line PQ l and the lines 12 ii Ls Ls and 34 ii Ls Ls .For the reason that P , Q , R , and S are laying at the same scan line PQ l , we can easily calculate the points P and Q by Equations (19) and (20), where R , S and , ( , ) ( 1,2,3,4) ii jj x y j  have been given by the image processes. In our experiments, the width between R and S means the width of the robot. Furthermore, the width RS W can be calculated by the proportion relationship, which is defined as:

Overall Procedures of the Proposed Localization Method
In the image processing, the coordinates of the robot can be calculated through the methods of gray-scale transformation, background subtraction, binarization, morphological processing, connected components labeling, and averaging method. The background image is updated in every s t seconds with the update Equation (23). After image processing, the coordinate in the image-domain goes through the IBDMS for calculating the distance s d (Equation (18)) and the PLDMS for calculating the width i w (Equation (22)). Then, we can locate the robot in the real-world domain.

IBDMS and PLDMS Set-Up
Some basic set-up steps must be performed for the IBDMS and PLDMS before we start the robot localization procedures. In order to alleviate the impact from the image noise, building the first background image is done taking the average of 150 consecutive images, and then the background subtraction method can much more effectively extract the foreground image. Besides, w low-pass filter is adopted to further refine the background image. The obtained background images are shown in Figure 12. In the 1st webcam, as shown in Figures 13 and 14, four corners of the known-dimension ground tile are used to draw a pair of the virtual parallel line ( 11 12 Ls Ls and 11 34 Ls Ls ), whose deriving linear equations can be expressed as: (24) and: (25) where the coordinates of 1    Ls Ls ) for the 2nd webcam can be expressed as: (26) and: (27) where the coordinates of 2 1 Ls , 2 2 Ls , 2 3 Ls , and   The virtual parallel line ( 33 12 Ls Ls and 33 34 Ls Ls ) for the 3rd webcam can be expressed as:

Experimental Results
In our experiments, a remote-controlled track-robot moves through the monitored areas. The path of the robot is shown in Figure 19a. In Figure 19a, the circled locations, causing bigger errors as shown in Figure 19b, are the coordinates of the front arms of the robot at the moment of which the robot is moving into the covering area of the 2nd webcam, as shown in Figure 19c. In Figure 19d, a bigger error happens in the circled locations, which are the coordinates of the front arms of the mobile robot at the moment of which the robot is moving into the covering area of the 3rd webcam, as shown in Figure 19e. The error function used to show the ability of our proposed localization method is: (30) where are the actual coordinates, and are the coordinates measured by the proposed method. In Table 1, the measurement errors range from 2.24 cm (when the robot is near the webcams) to 12.37 cm (when the robot is far away from the webcam). According to the definition of and the dimensions of the robot, which are 54 cm × 54 cm, a common size for patrolling robots, we find that the robot can be correctly located even though it is far away from the webcams. Limited by the low-resolution webcams, the measurement errors are acceptable. We also can easily reduce the measurement errors by using high-resolution CCD camera. Furthermore, the selection of a known-dimension rectangle pattern should be clearly seen in the captured image in order to set up the reference points of IBDMS and PLDMS. In addition, a suitable threshold value in the segmentation horizontal projection, which is used to update the background image, and the distortion coefficient vectors and camera conversion matrix in image calibration are important factors for precisely locating the mobile robots.
Under the conditions of the static monitored area, it is assumed that light sources and locations of walls and furniture are given. Light influence, therefore, can be easily attenuated through choosing appropriate factors in the image processing techniques. In this paper, we pay more attention to locating the moving robot by using single webcam and have not yet considered the situation of partial occlusions. In this static indoor environment, some image techniques [33][34][35] could be used to overcome temporary partial occlusion.  recognition methods to identify the mobile robot, but rather just uses a simple formula to estimate distance. From the experimental results, the localization method is both reliable and effective.