Target Recognition and Localization of Mobile Robot with Monocular PTZ Camera

. The target recognition and location based on the vision sensor is a kind of more intuitive and effective method. The paper designs a human mobile robot and the monocular PTZ camera fixed on the differential driver of the mobile robot for the target recognition and localization. The camera parameters are calibrated using the Zhang Zhengyou calibration method; target recognition is based on the combination of the color and the edge detection. First, the objects with the same color are extracted by the color recognition method (by setting an appropriate threshold value for the HSV color space of the acquired image). Then the target is further extracted by the edge detection (by Hough circle transformation); target location is based on the Similar Triangle Principle because the PTZ camera has pitch, tilt, and zoom characteristics and the monocular vision sensor of the camera with different pitching angles is used to measure the distance between the robot and the target. According to the characteristics of the monocular PTZ camera the mobile robot realizes the target localization and tracking. The simulation and experimental results demonstrate that the mobile robot shows better target recognition and localization in mobile tracking target process and proves the effectiveness of the method.


Introduction
Target recognition and location based on monocular vision is of great practical value.In recent years, the research of the monocular vision-based target detection algorithm has been valued by the home and abroad scholars.In [1], a method by using a single camera as monocular measurement is presented based on image processing to alleviate the effect of matching of corresponding feature points and extraction error of single feature point.But the maximum relative error still reaches 1.68% after revised data.In [2], the author assumes that a mobile robot is equipped with a single camera and a map marking the positions in its environment of landmarks.In [3], two different images were chosen from a sequence of images for the same target in different position.Then, feature points could be extracted and matched through scale-invariant feature transform algorithm for the two images.Through analysis of the different positions for the same feature point on different images and combined with the movement parameters of the aircraft, the information between the aircraft and the obstacle could be got.Reference [4] proposes the monocular vision measurement algorithm based on the color and texture of the ground.It improves the ant colony optimization combined with the car model performance and actual running environment.The performance of the optimization algorithm is better than other algorithms in run time.In order to reduce the deviation brought by feature points matching and optical axis inclination, [5] proposes a new method to measure the target distance for wheeled mobile robot based on the monocular vision, which can extend the plane objective to the three-dimensional one and achieve high measurement accuracy without adjustments.Experimental results indicate that the comprehensive error ratios of the proposed method are all under 0.7%, which can satisfy the system requirements of instantaneity and reliability for monocular distance measurement of wheeled mobile robot.A sensor system is developed in [6] to measure the position and orientation of planar robots.In the sensor system, a monocular vision is integrated with a detection method for abstracting the scale-and orientation-invariant image features.Instead of using multiple cameras, a monocular vision is utilized as the only sensing device to reduce the computation cost.The scale-and orientation-invariant method is employed to guarantee a robust detection and description of features abstracted from an image.Experiment is carried out on a free-moving monocular camera to verify the performances of the proposed system.Reference [7] researched the real-time location algorithm based on monocular vision.According to the principle of pinhole imaging, the mapping relationship between the imaging points and the target points is obtained, and pinhole model is established.Then, the depth information of the image is obtained through the geometric relationship between the image points and the target points.A range finding algorithm based on monocular vision is proposed in [8], which realizes 3D to two-dimensional transformation using a camera captured image and obtains depth information.This method can accomplish target tracking and ranging in dynamic environment.Reference [9] proposed a kind of panoramic vision which can observe visual field without dead angle at 360 degrees and can collect all the visual information in the direction of space.But its disadvantage is that the robustness of extracting environment features is poor, and the feature matching is difficult.However, these studies lack real-time tracking information on the entire mobile process of the mobile humanoid redundant robot without adopting the PTZ camera with the function of the pitch and the tilt.
Considering the drawbacks of the previous studies, we propose the target recognition methods based on color and geometry using the monocular PTZ camera.The target is measured based on the Right Triangle Similar Principle under the different pitching angles of the monocular visual sensor and the relative coordinates.Comparing the results of the simulation and of the experiment, the method demonstrates the effectiveness and the real-time performance.

Target Recognition Based Multifeature Fusion of Dual-Arm Mobile Robot
The paper models the target reorganization by adopting the recognition method of the combining color and geometric shape of the target.The experimental results prove that the method has better real-time performance, effectiveness, and practicability which offer a good foundation for the followup studies that the humanoid robot realizes tracking and operating the object completely.Thus the target recognition of the mobile robot with more redundancy has a wider range of theoretical and practical value.

Target Recognition Based on Color.
A color target is extracted according to color characteristics and the rest In this paper the target recognition system based on the monocular PTZ camera consists of the dual-arm mobile robot shown in Figure 1(a), monocular PTZ camera shown in Figure 1(b), and the differential driving platform shown in Figure 1(c), respectively.The main product parameters and the internal parameter matrix of the monocular PTZ camera are shown in Table 1 and in Equation (3), respectively.The internal parameter of the PTZ camera was obtained by Zhang Zhengyou calibration method, as shown in Section 3.1.
The target recognition based on the color consists of the image capturing, a series of image processing, and the image display.This paper uses the 2-Mode threshold segmentation method using HSV color space model to detect and identify the targets.Threshold Segmentation and Image Binarization.Threshold segmentation is a region-based image segmentation technique, which divides image pixels into several categories.Threshold image segmentation is one of the most commonly used traditional image segmentation methods.It has become the most basic and widely used segmentation technology in image segmentation because of its simple implementation, small computation, and stable performance.It is especially suitable for images with different gray level ranges between targets and backgrounds.It can not only greatly compress the amount of data but also greatly simplify the analysis and processing steps.Therefore, in many cases, it is a necessary image preprocessing process before image analysis, feature extraction, and pattern recognition.Threshold segmentation is to binarize the image first, then to determine a suitable threshold, and to compare the current pixel with the threshold.If the current pixel is larger than the threshold, it is the target (black); if the current pixel is smaller than the threshold, it is the background (white).The gray threshold segmentation transformation expression is described as where T represents the threshold of the binary image.
There are many algorithms for image segmentation.The object recognition in this paper is simple in image background and the 2-Mode image segmentation method can be used.The threshold selection for image segmentation based on 2-Mode method is that many gray-scale similar pixels make up each region.If there is a big difference between the objects and the background in the image, bimodal image will appear in the gray-scale histogram.Therefore, an appropriate threshold can be selected for image segmentation.When the histogram is two obvious peaks, choose the valley value between the two peaks values as the best threshold.
The image distribution of two peaks can be shown in Figure 3.The vertical values of the peaks are Hmax1 and Hmax2, respectively, and their corresponding gray values are T1 and T2, respectively.Then the idea of bimodal image segmentation is to find the lowest valley value between the two peaks of the image, that is, to find the threshold T in

Target Recognition Experiment Based on
Color.The target in this paper is a green tennis.First, the green objects can be well identified in Figure 8(b) when different color objects are used, which is compared with Figure 8(a) by color recognition.

Target Recognition
Based on Edge Detection.Using color segmentation, the possible region of the target is obtained.
Then the target recognition based on edge detection is performed, which can reduce the work of the data processing of the original image and increase the success rate of the target recognition and improve the efficiency of testing.Edge detection is relatively basic and practical in visual research and the algorithm processing of edge detection is usually carried out after image enhancement, algorithm processing, color threshold segmentation, and filtering algorithm processing, all of which are detected by adjusting the value of the threshold.

Canny Edge Detection
Method.The purpose of image edge detection is to enhance the edge information of objects in the image and to reduce the interference of useless information to a certain extent, so as to serve the later image processing.The paper adopts Canny edge detection algorithm.The method of Canny edge detection is to use a 5 * 5 convolution kernel for Gaussian blur (the Gaussian blur here is not the same step as that of the upper section, where the Gaussian blur belongs to a part of the Canny edge detection algorithm), and then use a pair of convolution matrices to calculate the gradient direction and amplitude to suppress nonpolarity.Large values are used to filter out potential edges.Finally, the edges of objects in the detected image are determined by a lag threshold (consisting of a low threshold and a high threshold).The function of lag threshold is to regard a pixel as an edge pixel when its pixel value exceeds a high threshold value and a pixel as a nonedge pixel when its pixel value is lower than a low threshold value.It can be seen from Figure 7(c) that Canny edge detection can detect the object in the image well when the parameters are selected properly (including target).

Hough Transform for Circle Detection.
Hough Transform is one of the basic methods to recognize geometric shapes from images in image processing.The basic principle of Hough Transform is to transform a given curve in the original image space into a point in the parametric space by using the duality of points and lines.In this way, the problem of detecting a given curve in the original image is transformed into the problem of finding the peak value in the parameter space, that is, to transform the overall detection characteristics into detection of local characteristics, such as straight lines, ellipses, circles, and arcs.
For more precision of the target recognition, Hough Transform for the circle detection is used for the further recognition after the target recognition based on the color and on the edge detection.The basic principle of Hough circle detection is that the basic formula of circle transformation is to accumulate the three-dimensional parameter space coordinates (a, b, r) and complete the detection task by statistics.In the image, a cone surface appears in the corresponding parameter space for each point transformed by (2).After all the edge points are transformed, a cluster of cones intersect at a point, and then the number of times intersecting at the same  point is accumulated.If the number exceeds a set threshold, the circle parameters are obtained.
where (, ) represents pixel coordinates in images.Hough Transform can accurately select the geometric shapes with specific requirements in images with multiple geometric shapes.

Edge Detection Experiment Based on Canny Edge Detection and Hough Circle Detection.
According to the method in Section 2.1.2,we recognize the green objects, as shown in Figure 7(b).According to the methods in Section 2.2.1, when different geometric shape targets are used, the contours of every target can be well represented in Figure 7(c) which is compared with Figure 7(b) by the geometric shape recognition.According to the methods in Section 2.2.2, we further recognize the circular target in Figure 7(d) which is compared with Figure 7(c) by the Hough Circle detection.Similarly when the target is in major environment, as shown in Figure 9(a), the contours of every target can be well represented by Canny edge detection in Figure 9(b) and the circular target can be recognized by Hugh Circle detection in Figure 9(c).Experiments show that this method has better recognition effect and meets the requirement of the target recognition.

Target Location Based on Similar Triangle Theory of Dual-Arm Mobile Robot
Compared with the distance measurement based on the binocular vision, the monocular vision ranging is cheaper, has relatively simpler structure, is the simpler algorithm, and is more practical.Combined with the feature that the PTZ (Pan/Tilt/Zoom) camera can speed up tracking target, the paper adopts the monocular PTZ camera to locate target.

Zhang Zhengyou Calibration Method.
In this method, we first need to make a plane calibration board, then use the camera to take calibration plates from different directions, usually take ten or twenty pictures, then use the camera calibration tool of MATLAB software to process the image, and then calculate the camera's internal parameter matrix.
There is a one-to-one correspondence between each feature point (corner point extracted by Harris algorithm) on the calibration board and the corresponding image points on its image, which can be expressed by homography matrix.
For each image, a corresponding matrix can be determined, which provides constraints for the solution of internal parameters.The algorithm is based on the idea of two-step method.First, the initial values of some parameters are obtained by a linear method; then the linear results are optimized nonlinearly by considering radial distortion and maximum likelihood criterion.Finally, the external parameters are obtained by using the calculated internal parameters and homography matrix.The space map of Zhang Zhengyou's plane calibration is shown in Figure 9 .
The internal parameter matrix of the monocular PTZ camera can be obtained by Zhang Zhengyou calibration method as follows: This paper adopts the monocular vision location method with PTZ camera.The schematic of monocular vision-based distance measurement is presented in Figure 10.
0 is the centre point of the lens; ( 0 ,  0 ) is the cross point between the optic axis and the imaginary plane and the origin of the imaginary plane;   (, ) is the projection of the detected point  in the imaginary plane, as shown in Figure 10.
The relevant relationships can be described as follows: The distance  between the mobile robot and the target can be derived by ( 4), (5), and ( 6): where ℎ and  are known;  2 =  2 +  2 is established; the unit of (, V) is the pixel;   ( 0 , V 0 ) is the frame memory coordinate of the ( 0 ,  0 ) in the camera optical axis and the image plane intersection; the right-arm represents the velocity of the fixed dual-arm robot;   (, V) is the frame memory coordinate of the   (, ); the physical dimensions of the image plane in the X axis and the Y axis corresponding to one pixel in frame memory are   and   , respectively.The transformation from  Im = [  1]  in image coordinates to   = [ V 1]  in pixel coordinates can be described by the following homogenous transformation matrix: ⇒  = ( −  0 )   ; When   = /  and   = /  are established, (11) is deduced by (10): where   ,   ,  0 , V 0 are separately the intrinsic parameters of the monocular PTZ camera obtained by Zhang Zhengyou calibration method.In the end the distance  between the measured point  and the monocular PTZ camera can be derived by ( 9), (10), and (11), as shown in (12).

Realization of Target Recognition and Location Based on Monocular PTZ Camera of the Mobile Robot
Considering the characteristics of the mobile robots and the target movement, the image is captured at rate of 10 frames per second and processed to assure the real-time requirement of the dual-arm mobile robot system.The specific processes are listed as follows.
Step 1. Build a dual-arm mobile robot and mount visual system on the platform of the robot.
Step 2. Calibrate the PTZ camera using the Zhang Zhengyou calibration method and get its internal parameters.
Step 3. Perform image processing: selecting proper camera, reading the image data per frame, distortion rectifying, histogram equalization of acquired color image, morphological filtering, and so on.
Step 4. Adopt the combined target recognition method based on color and the geometric shape of the target.
Step 5. Use the similarity principle of right-angled triangle to measure the distance between the target and the mobile robot.
Step 6.The next motion status of the mobile dual-arm robot is confirmed and the program will go to Step 3; perform loop process.

Experiment and Analysis
For verifying validity about this method, we have done several verification experiments separately.

Experiment I: Target Recognition Experiment.
Except for the target in simple scenes, as shown in Figure 7, the experiment of the target put in complex real-world scene is done, which is shown in Figure 11 2).From its effects point of view, the circular target can be better detected.

Experiment II: Target Location Experiment.
The target is put from near to far, which is shown in Figure 12.The actual distance and the visual ranging between the mobile robot and the target and the error between the two distances are shown in Table 2.
As seen from Table 2, when the distance is less than 600mm ( = -25 ∘ ), the relative error is larger.The italic digital part is the part whose measuring distance error is less than one percentage point and is the better tilt angle corresponding to the measurement distance.Within this range the ranging method based on the PTZ visual sensor has better effect for the mobile robot.This shortcoming can be solved by the movement of the mobile robot which also included the pan and the tilt of the PTZ camera and the correlation algorithm.Therefore the method of the recognition and location based on the PTZ visual sensor of the mobile robot shows good practicality and reliability.

Conclusions and Discussion
This paper sets up a visual recognition system based on color and the geometric shape target recognition (Hough circle transformation) and visual location system based on the similarity principle of right-angled triangle of the mobile robot.According to the characteristics of the monocular PTZ vision sensor the mobile robot realizes the target localization and tracking through the real experiment.The experimental results prove the effectiveness and the practicability of the two methods.In future works, combining the advantages of laser sensor and research on the target recognition and localization of the mobile robot based on the monocular PTZ camera and the laser sensor will be carried out to improve accuracy of the target recognition and localization of the

2. 1 . 1 .
Imaging Processing Method.The imaging processing methods include gray level conversion, histogram, threshold segmentation, image binarization, edge detection, and color component extraction.Gray-Scale Histogram.Histogram is a simple and practical tool for image processing and used to understand feature processing in image.The gray-scale graph usually has 256 gray-scale, and the gray histogram represents the number of pixels with gray level.The left side of the histogram represents the dark part, the right side represents the bright part, and the middle indicates the middle tone.Histogram can directly provide us with different gray-scale distribution information.Figure 2 is the gray-scale in Figure 2(b) and histogram in Figure 2(c) corresponds to the original image in Figure 2(a).

Figure 1 :
Figure 1: (a) Dual-arm mobile robot with high redundancy, (b) monocular PTZ camera, and (c) monocular vision system fixed on the mobile platter.

Figure 2 :Figure 3 :
Figure 2: (a) Original photo, (b) gray-scale maps, and (c) gray-scale transformation into histograms (the abscissa represents the gray level and the ordinate represents the frequency of the gray scale (that is, the number of pixels)).

Figure 2 (
a) is converted into the binary image shown in Figure 4(a) using threshold value and the binary image with a range of 150 is shown in Figure 4(b).
When the pixel value of a pixel is in the range of two thresholds, only the pixels connected with the pixel point are edge images.Prime pixels only regard pixels as edge pixels.The original image and the effect diagram by Canny edge detection are, respectively, shown in Figures 7(b) and 7(c).

Figure 5 :
Figure 5: (a) Original photo, (b)∼(c) image and corresponding histogram of R component for target, (d)∼(e) image and corresponding histogram of G component for target, and (f)∼(g) image and corresponding histogram of B component for target.

Figure 6 :
Figure 6: (a)∼(b) Image and corresponding histogram of H component for target, (c)∼(d) image and corresponding histogram of S component for target, and (e)∼(f) image and corresponding histogram of V component for target.

Figure 7 :Figure 8 :Figure 9 :
Figure 7: Target effect image with the different minimum threshold value in S component, (a) effect image with the minimum threshold value 50, (b) effect image with the minimum threshold value 100, (c) effect image with the minimum threshold value 150, and (d) effect image with the minimum threshold value 200.

Figure 11 :
Figure 11: (a) Original photo, (b) effect diagram of Canny edge detection, and (c) effect diagram of Hough transform for circle detection.
(a).The effect diagram of Canny edge detection is shown in Figure 11(b) and the effect diagram of Hough transform for circle detection is shown in Figure 8(c) (the specific principle and operation process are shown in Sections 2.1 and 2.

Figure 12
Figure 12: (a)-(e) Different position point of the target from near to far.

Table 1 :
Main product parameters of the monocular PTZ camera.

Table 2 :
Comparison table of actual distance and visual ranging between the target and the mobile robot from near to far with different tilt angles.The relative algorithm will be optimized in further experiments including the study of the multisensor recognition and high-precision location about the redundant mobile dual-arm robots.