A Survey on Visual Perception for RoboCup MSL Soccer Robots

Visual perception is the most important method for providing information about the competition environment for RoboCup Middle Size League (MSL) soccer robots. The paper reviews the advancement of visual perception in RoboCup MSL soccer robots from several points of view including the design and calibration of the vision system, the visual object recognition, the estimation of the object's motion, robot visual self-localization and multi-robot cooperative sensing. The research progress we have achieved is also introduced in this review. The developing trends and the future research focuses on this problem are also discussed.


Introduction
RoboCup is an international research and education initiative to foster research into artificial intelligence and intelligent robots by providing a standard test-bed where a wide range of technologies can be tested and integrated.The final goal of RoboCup is to develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team by 2050.
RoboCup includes RoboCup Soccer (Simulation League, Small Size League, Middle Size League, Standard Platform League), RoboCup Rescue, RoboCup@home, RoboCup Junior, etc.In the RoboCup Middle Size League (MSL), all the robots are totally distributed and autonomous and must use their own sensors to obtain environment information and use their own computer to process sensor information and realize decision making, planning and control.Wireless communication with a limited bandwidth can be used to help coordination and cooperation with teammates.
Because all the MSL robots are totally distributed and autonomous, all the robot sensors are on-board and sensor information acquisition and processing are also performed by the on-board computer.Commonly used sensors include the vision system, the motor encoder, the digital compass, the gyro and so on.Because the vision system is relatively low-cost and it can provide the richest environment information, visual perception has become the most important method for realizing object recognition, estimation of the object's motion and selflocalization for soccer robots.
During the competition, up to 5 robots are allowed to play a game for each team on a 18x12m field.A typical competition scene is shown in Figure 1.As the competition has become more and more fierce, RoboCup MSL has become an increasingly and highly dynamic environment.Therefore, the visual perception should be run with high accuracy and high robustness in real-time.In this paper, we review the advancement of visual perception in RoboCup MSL soccer robots respectively from the following aspects: the design and calibration of the vision system, the visual object recognition, the estimation of the object's motion, robot self-localization and multi-robot cooperative sensing.The research progresses achieved by our team, NuBot, are also introduced.Furthermore, because many problems in visual perception for MSL soccer robots are common in computer/robot vision research, this survey is also of value for other researchers in the computer/robot vision community.The following sections are organized as follows: the design and calibration of the robot vision system are introduced in Section 2, visual object recognition is introduced in Section 3, the estimation of the object's motion is presented in Section 4, robot selflocalization is presented in Section 5, multi-robot cooperative sensing is introduced in Section 6, the developing trends and the research focuses are discussed in Section 7 and Section 8 concludes the paper.

The design and calibration of the vision system
The catadioptric omnidirectional vision system is one of the most popular sensors for the RoboCup MSL soccer robots, which has been used by almost all of the MSL soccer robots, as shown in Figure 1.It consists of a convex mirror and a camera pointed upward towards the mirror [1], as shown in Figure 2.This system can provide a 360 o view of the robot's surrounding environment in a single image and robots can use it to realize object recognition by image processing and understanding and selflocalization by fusing the odometry information from motor encoders.Therefore, the perception information about the environment can be provided for robot control, planning, multi-robot cooperation and coordination.Some MSL teams also use the perspective camera as the front vision system to assist the omnidirectional vision system for dribbling and controlling the ball precisely.Because the intrinsic and extrinsic parameters can be calibrated by using the released toolbox for the perspective camera, we will not discuss this issue in this section and only the design and calibration of the omnidirectional vision system are presented.

The design of the vision system
The characteristics of an omnidirectional vision system are determined mostly by the shape of a panoramic mirror.According to the different imaging principles, the omnidirectional vision can be divided into singleviewpoint and non-single-viewpoint omnidirectional vision.The hyperboloidal mirror, the parabolla mirror and the ellipsoidal mirror can be used to construct a single-viewpoint omnidirectional vision system.The conic mirror, the spherical mirror, the horizontally isometric mirror and the vertically isometric mirror can be used to construct a non-single-viewpoint omnidirectional vision system.The imaging theory and characteristics of the omnidirectional vision using the conic mirror, the spherical mirror, the hyperboloidal mirror, the parabolla mirror and the ellipsoidal mirror can be found in [2] in detail.To make the imaging resolution of the scene constant horizontally, vertically and angularly, three mirrors with constant imaging resolution were designed in [3], called the horizontally isometric mirror, the vertically isometric mirror and the angularly isometric mirror, respectively.The most commonly used mirror is the hyperboloidal mirror, which is used by the robots from Tribots [4], RFC Stuttgart [5], Tech United [6], CAMBADA [7], etc.The main deficiency of this kind of mirror is that the imaging resolution decreases greatly as the distance to the robot increases and the imaging of the objects far from the robot is quite small, which is not suitable for robots to realize object recognition on a large scale.A multi-part mirror consisting of the horizontally isometric mirror, the constant curvature mirror and the planar mirror was designed in [8].NuBot team designed a new panoramic mirror, which consists of the hyperboloidal mirror, the horizontally isometric mirror and the vertically isometric mirror from interior to exterior [9,10].The profile curve of the panoramic mirror is demonstrated in Figure 3.The new omnidirectional vision system using this mirror not only makes the imaging resolution of the objects near the robot on the field constant and the imaging distortion of the objects far from the robot small in the vertical direction, but also enables the robot to acquire very clear imaging of the scene that is very close to it, including the robot itself.

The calibration of the vision system
Only after finishing the distance map calibration from the image coordinate to the robot centred real world coordinate, can the omnidirectional vision be applied to make a visual measurement.In the past few years, the calibration of the single-viewpoint omnidirectional vision has been deeply researched [11][12][13][14] and Scaramuzza [13] and Mei [14] have developed Matlab toolboxes for this kind of omnidirectional vision.However, the assumption that the mirror axis and the camera's optical axis are coincident and the shape of the panoramic mirror has to be symmetric is needed in these traditional calibration methods.In the RoboCup MSL, there are lots of shocks and stresses on the omnidirectional vision system during transport of the robots and during frequent crashes in the competition, so this assumption is easily violated and at the very least hard to meet when (re-)assembling the omnidirectional vision in the competition venue.Therefore, the calibration accuracy cannot be guaranteed by using these methods.
A calibration method for the non-single-viewpoint omnidirectional vision was proposed in [15], where the assumption that the mirror axis and the camera's optical axis are coincident is not needed, but the assumption that the shape of the panoramic mirror has to be symmetric is still needed.A general solution was developed to calibrate the omnidirectional vision by exploring a backpropagation ray-tracing approach and the geometric properties of the mirror surface [16,17], so the non-singleviewpoint misalignment from the imperfect mechanical setup and the use of a low cost camera could be compensated for.In [18] an efficient evolutionary approach was applied to calibrate the omnidirectional vision automatically after extracting the features and landmarks of the field known in advance from an image captured in a known pose.A model-free method was proposed to calibrate the omnidirectional vision for the robots in the Tribots team without needing the assumptions mentioned above and in [19].In this method, a series of the edge points in a calibration patch are extracted as the support vectors in several pre-defined directions and then these support vectors are used for interpolation to obtain accurate distance mapping between the image coordinates and the real world coordinates.

The visual object recognition
According to the current MSL rules, although colour goals have been replaced with white goal nets and colour goalposts have been removed, the competition environment is still colour-coded.Therefore, the basic abilities for MSL robots are to recognize the yellow ball, the green field, white lines and other black robots.The final goal of RoboCup is that the robot soccer team defeats a human championship team, so robots should be able to play in competitions under highly dynamic, even outdoor lighting, conditions.Furthermore, according to the current rules of RoboCup MSL, the illumination is not specified and the technical challenge of playing with an arbitrary FIFA ball has been introduced.Therefore, how to make visual object recognition work robustly under varying lighting conditions in the colour-coded MSL environment has become a challenging research focus, even without the constraints of the current colour-coded environment for soccer robots, like recognizing ordinary FIFA balls.Many researchers have tried to solve this problem from the following aspects.

Image acquisition
Several researchers have tried to make the acquired image of the vision system describe the environment as consistently as possible under different lighting conditions by adjusting camera parameters in image acquisition, so as to improve the robustness of the visual object recognition.The camera parameters displayed here are image acquisition parameters, not intrinsic or extrinsic parameters in camera calibration.
Grillo et al. defined camera parameter adjustment as an optimization problem and used the genetic meta-heuristic algorithm to solve it by minimizing the distances between the colour values of image regions selected manually and the theoretical values in the colour space [20].The theoretical colour values were used as reference values, so the effect from illumination could be eliminated, but special image regions must be selected manually by users in this method.Takahashi et al. used a set of PID controllers to modify the camera parameters like gain, iris and two white-balance channels according to the changes of a white reference colour, which is always visible in the omnidirectional vision system [21].Lunenburg and Ven adjusted the shutter time by designing a PI controller to modify the colour values of the referenced green field to the desired values [22].Neves et al. proposed an algorithm for the autonomous setup of camera parameters such as exposure, gain, white-balance and brightness for their omnidirectional vision [17,23], according to the intensity histogram of the images and a black and a white region known in advance.In this case a colour patch including the black and white region is required on the field, so it can only be used off-line before the competition.
Because some kind of reference colour is needed in the four methods mentioned above, the NuBot team proposed a novel method to auto-adjust the camera parameters based on image entropy [10,24].Image entropy was defined by using Shannon's entropy and then was verified by experiments to indicate whether the camera parameters are properly set, so that the camera parameters can be auto-adjusted to make the output of the vision system adaptive to varying lighting conditions.The experimental results show that some kind of colour constancy for the output of the omnidirectional vision can be achieved and robust object recognition can be realized under varying lighting conditions for soccer robots.Furthermore, unlike other methods, this method needs no referenced colour during the adjusting process, so it can be applied in more computer/robot vision situations.

Colour calibration and learning
In the colour calibration and learning of the vision system, the traditional methods, such as choosing the threshold to classify the colour [25] or building a colour lookup table [26] off-line through a human-computer interface, would not meet the requirements for segmenting the image and recognizing the object robustly when the lighting conditions fluctuate during the competition [27].Furthermore, off-line calibration is timeconsuming.Therefore, several on-line colour calibration and learning methods have been proposed in [28][29][30].In these methods, the field line points are first extracted without colour classification to realize the robot's selflocalization and then according to the known environment model and the self-localization result, several kinds of object regions can be searched for or detected, so the colour lookup table can be built up to realize colour auto-calibration, which can be adjusted in real-time during the competition process to make object recognition adaptive to the changes in illumination.

Image processing, analysis and understanding
In image processing, some researchers processed and transformed the images to achieve some kind of constancy, such as colour constancy [31] by the Retinex algorithm, to improve the robustness of colour classification and object recognition.However, the computation cost of this method is usually high and it is not suitable for the highly dynamic MSL competition.
In image analysis and understanding, the dynamic lighting conditions bring a great challenge to traditional object recognition methods by segmenting the image first and then detecting the colour blobs.Therefore, several object recognition algorithms that do not depend on colour segmentation have been proposed [21,32,33].In [21], Markov Random Fields was used to segment the panoramic image and then, based on the assumption that the distribution of the object colour is Gaussian, each pixel of the image was classified to be an object colour according to its Mahalanobis distances to the Gaussian distribution of all of the reference object colours.The experimental results in the indoor and outdoor environments validated the effectiveness of this method.In [32], a robust algorithm was presented to recognize the orange ball.The image was segmented according to the Bayes classifier based on the colour histogram in UV colour space and then the ball was detected by using a randomized Hough transform.Finally, the colour histogram could be updated to adapt to the changes in illumination according to the recognition results.
In recent years, object recognition without any colour classification, especially the recognition of arbitrary FIFA balls, has become a research focus in the robot vision of MSL [34][35][36][37][38][39][40][41][42][43][44].A so-called Contracting Curve Density (CCD) algorithm [34][35][36] was proposed by Hanek et al. to recognize soccer balls without colour labelling.This algorithm fits parametric curve models with image data by using local criteria based on local image statistics to separate adjacent regions.The contour of the ball could be extracted even in cluttered environments under different illumination, but the vague position of the ball needed to be known in advance.Therefore, global detection could not be realized using this method.Treptow and Zell integrated the Adaboost feature learning algorithm into a condensation tracking framework [37], so a ball without a special colour could be detected and tracked in real-time even in cluttered environments.Mitri et al. presented a novel scheme [38] for fast colour invariant ball detection, in which the edged filtered images serve as the input of an Adaboost learning procedure that constructs a cascade of classification and regression trees.Different soccer balls could be detected by this method in different environments, but the false positive rate was high when other round objects were introduced into the environment.They then combined a biologically inspired attention system, VOCUS [39], with the cascade of classifiers.This combination made their ball recognition highly robust and eliminated false detection effectively.Coath and Musumeci proposed an edge-based arc fitting algorithm [40] for soccer robots to detect the ball.Bonarini et al. used a circular Hough transform on the edges extracted from a colour invariant transformation algorithm to detect the generic ball and a Kalman Filter was also applied to track and predict the position of the ball in the next image to reduce the computational load [41].An advanced version of the Hough transform was proposed to detect the ball without colour information by using the structure tensor technique in [42], but this method is time consuming and cannot be run in real-time.
All the algorithms mentioned above were used only in the perspective camera in which the field of view was far smaller and the image was also much less complex than that of an omnidirectional vision system.Some researchers have used omnidirectional vision systems to recognize arbitrary FIFA balls recently [5,17,43].Because their panoramic mirrors are hyperbolic, the balls are imaged as circles in the panoramic images.Martins et al. used a canny operator to detect the edges and then applied the circular Hough transform to detect all of the candidate circles imaged by the balls [17,43].An effective validation process was proposed to discard the false positives.Zweigle et al. used a standard Hough transform to detect all the circles in the panoramic image and then extracted the colour histogram for each circle and compared it with the colour histogram learned in the off-line calibration process to validate the real FIFA balls [5].Experimental results showed that the correct detection rates of these two methods were very high.However, all the above experiments were performed in very simple environments.
The NuBot team also proposed an arbitrary ball recognition algorithm based on omnidirectional vision [10,44].It was concluded that the ball on the field could be imaged as an approximate ellipse in panoramic images, so the ball could be recognized without colour classification by detecting the ellipse with an image processing algorithm.Once the ball has been detected globally, the ball can be tracked in real-time by integrating a ball speed estimation algorithm.The experimental results show that the arbitrary FIFA ball can be recognized and tracked effectively in real-time even if the environments are cluttered.This algorithm does not need any learning or training steps and global recognition can be dealt with.More effective tracking algorithms or other recognition methods should be integrated into this algorithm, so the robot can recognize and track the ball more effectively even when the ball is occluded frequently.

The estimation of the object's motion
In the highly dynamic RoboCup MSL environment, accurate estimation of the object's motion, such as the motion velocity of the ball, the opponent robot or the robot itself, is the basis of success in ball passing and intercepting, accurate motion planning and control for obstacle avoidance and therefore, the best choice of robot behaviour.
In [45], Lauer et al. assumed that the motion of the ball rolling on the field is a linear movement with constant velocity during a small piece of time, so the estimation of the ball velocity could be modelled as a standard linear regression problem that ridge regression can be used to solve.They then used a similar algorithm to evaluate the ego-motion of the robot itself and so collision with obstacles could be detected reliably [46].In [47], a Kalman Filter was used to detect whether the ball was moving or stationary.In the corrector part of the Kalman Filter, a multilayer perceptron artificial neural network was integrated to reduce the affection of image noises caused by the motion vibration of the robot, so the robustness of the state detection could be improved.
The estimation methods mentioned above can only be used in a situation where the object is located on the ground field.However, the ball is often lifted by the robots' high kicks during RoboCup MSL competition.So how to estimate the ball's motion in three-dimensional space is very important for improving the defence ability of soccer robots, especially the goalie robot.In [48,49], a particle filter was applied to track the ball using omnidirectional vision in three-dimensional space so that the three-dimensional ball velocity could be estimated.In this method, the three-dimensional shape of the ball was considered and the colour histograms of the inner and outer boundary on the panoramic image projected by the ball were used to construct the observation model in particle filter.The experimental results show that it can precisely track not only the ball in three-dimensional space, but also the robot on the ground field.

The visual self-localization
Robots should be able to localize themselves on the field and perform coordination, cooperation and motion planning, etc.The challenging issues in visual selflocalization are as follows: during the competition, the robots from both teams move quickly and often in an unpredictable way and robots often collide, so the robot vision systems are often occluded by teammates or opponents, so wrong self-localization cannot be avoided completely; according to the current rules, the illumination is not specified and more and more natural light has been added to the field, which also brings challenges into visual self-localization.Robots should realize accurate, robust and real-time self-localization in this highly dynamic environment and be able to detect incorrect localization and then retrieve correct localization globally.

In the past decade, four kinds of localization methods have been developed to solve robot self-localization in MSL:
 The triangulation approach by extracting the landmarks like blue/yellow goals and goalposts [50,51]  The geometry localization by extracting the field lines with a Hough transform and then identifying the lines using goal or goalpost information [8]  Monte Carlo Localization (MCL) method, also called the Particle filter localization method [52-54]  The localization approach based on matching optimization (for simplification, we will call it matching optimization localization in this paper) [55,56].
Since 2008, colour goals have been replaced with white goal nets and colour goalposts have been removed, so the first two approaches cannot be used any more.The latter two approaches have become the most popular localization methods for soccer robots.MCL is an efficient implementation of general Markov localization based on Bayes filtering.In MCL, the probability density of the robot's localization is represented as a set of weighted particles.During the localization process, the following three steps are performed iteratively: resampling according to particle weights, predicting new positions for particles according to the motion model and updating and normalizing particle weights using the sensor model.The weighted mean of all the particles is the localization result.The computation cost of standard MCL is quite high, because a large number of particles are needed to localize the robot well.Therefore, several modified versions of MCL have been proposed to improve efficiency by adapting the number of particles [54].In [54], the number of particles can be reduced to be one when the localization estimation of the previous cycle is sufficiently accurate.
Lauer et al. proposed an approach based on matching optimization to achieve efficient and accurate robot selflocalization [55].The main idea is to match the detected visual feature points with the field information, so robot self-localization can be modelled as an error minimization problem by defining the error function.The RPROP algorithm was used to solve this problem to acquire optimal localization.The odometry information from motor encoders was fused to calculate a smooth localization by applying a simplified Kalman filter.The experimental results show that matching optimization localization outperforms MCL in accuracy, robustness and efficiency.
There are advantages and disadvantages in MCL and matching optimization localization.MCL can deal with global localization, which means that when wrong localization occurs, right localization can be recovered.So the kidnapped robot problem can be solved effectively with MCL.However, a large number of particles are needed to represent the real posterior distribution of robot's localization well and the computation complexity increases with the number of particles.There is a contradiction between accuracy and efficiency, so it is hard to achieve robot self-localization with high accuracy and high efficiency simultaneously by using MCL.In matching optimization localization, the accuracy of robot self-localization only depends on the accuracy of optimizing computation and visual measurements and matching optimization can be completed in several milliseconds, so this approach is a localization method with both high accuracy and high efficiency.However, an initial localization value should be given to perform the optimization, so it is an algorithm for localization tracking that cannot solve the problem of global localization.
The NuBot team proposed a self-localization algorithm by combining these two approaches to maintain their advantages and avoid the disadvantages [57,58].Firstly, MCL was used to achieve global localization and then matching optimization localization was applied to realize accurate and efficient localization tracking with the result of MCL as the initial localization value.When incorrect localization occurs, global localization with MCL will be restarted to retrieve the correct localization.After integrating the camera parameters' auto-adjusting method based on image entropy [24], the experimental results show that global localization can be realized effectively while highly accurate localization is achieved in real-time and robot self-localization is robust in the highly dynamic environment with occlusions and changing lighting conditions.

Multi-robot cooperative sensing
Multi-robot cooperative sensing is an important research issue in multi-robot systems and the RoboCup MSL is an ideal test bed for multi-robot cooperative sensing.
Because the field of view or the sensing range of the robot's vision system is limited, important objects like the ball are occluded frequently by teammates or opponents.Furthermore, all the robots are distributed and autonomous, so the world model achieved by every robot cannot be consistent with each other because of unavoidable sensing noises.These factors will cause inconsistency in the cooperative behaviours between robots.Realizing cooperative localization of the ball and other objects and cooperative self-localization between multiple robots can improve the sensing accuracy to build up a coherent world model for the whole robot team, which has become more and more significant for improving the performance of the whole robot team.
In [59], Durrant-Whyte's approach was used to fuse the position of the ball between teammates, so a globally coherent estimation about the position of the ball could be achieved.A multi-robot/sensor cooperative object detection and tracking method based on a decentralized Bayesian approach was proposed in [60].A local filter and a team filter were included in this method.When the robot could see the ball, the local filter was run to fuse the ball information from the robot itself and teammates.When the robot could not detect the ball, the team filter was run to fuse the ball information from different teammates.In [61], fuzzy logic was used to fuse the visual observations of the ball from several robots.In this method, the uncertainty of the robot self-localization was taken into consideration, which was propagated to the uncertainty of the observation of the ball.The CAMBADA team described their work on the information fusion for multi-robot system in [56,62].A real-time database [63] was used to realize information sharing between the robots.The shared teammates' selflocalization was used to judge whether the detected black obstacle is a teammate robot or an opponent robot.

Developing trends and the Research focuses
For the improvement of the soccer robots' performance and the realization of the final goal of RoboCup, the overall developing trend in visual perception is to provide more and more information with high accuracy in real-time in the more and more complex competition environment for the RoboCup MSL soccer robots.The following issues should be the focus of future research into visual perception in RoboCup MSL: 1.The robustness of the robot vision system should be improved to make it work reliably in indoor and outdoor environments with highly dynamic lighting conditions.2. Object detection, localization, prediction and tracking should be realized in three-dimensional space [19,64,65] because the ball is often lifted by the robots' high kicks.This problem can be dealt with by constructing a hybrid vision system using a binocular stereovision system and an omnidirectional vision system, so the advantages of these two vision systems can be combined.3. The real-time performance and the effectiveness of the current algorithms for arbitrary FIFA ball recognition should be further improved to be able to work well in a complex environment with lots of disturbance.4. The accuracy of the estimation of the moving object's velocity and acceleration in three-dimensional space should be improved as the competition becomes more and more fierce and dynamic. 5.The robots without specific black colour should be recognized effectively to obtain robot identification information such as the number and team that the robot belongs to [66,67] by introducing more novel and advanced theories and techniques for generic object recognition in the computer vision and pattern recognition community [68][69][70], so the colour-coded extent of the RoboCup MSL environment can be reduced, which will also promote the fusion of between the computer vision/pattern recognition community and the robotics community.6.The coherence and the sensing accuracy of each robot's world model in a multi-robot system should be improved by cooperative sensing, so the dependence on communication can be reduced for cooperation and coordination in a multi-robot system.7.More embedded vision devices can be used in the RoboCup MSL [71] to augment the performance of the robot vision system, because the research into and the application of the embedded vision have become more and more popular in the machine vision community.

Conclusions
In this paper, we review the advancements in visual perception in the RoboCup MSL soccer robots achieved over the past decade and present the developing trends and research focuses.To our knowledge, this is the first review paper focusing on the visual perception of the RoboCup MSL soccer robots.Therefore, it is especially valuable for newcomers to robot soccer and researchers who are not familiar with, but are interested in, robot soccer.
In the future, the visual perception in all the other RoboCup leagues and FIRA leagues should also be analysed and summarized, so together with this paper, a full panorama about the vision techniques for robot soccer can be provided.

Figure 1 .
Figure 1.A typical scene of the RoboCup MSL competition.

Figure 2 .
Figure 2. The sketch of the catadioptric omnidirectional vision consisting of a convex mirror and a camera.

Figure 3 .
Figure 3.The profile curve of the panoramic mirror developed by NuBot team.