Markerless Eye-Hand Kinematic Calibration on the iCub Humanoid Robot

Humanoid robots are resourceful platforms and can be used in diverse application scenarios. However, their high number of degrees of freedom (i.e., moving arms, head and eyes) deteriorates the precision of eye-hand coordination. A good kinematic calibration is often difficult to achieve, due to several factors, e.g., unmodeled deformations of the structure or backlash in the actuators. This is particularly challenging for very complex robots such as the iCub humanoid robot, which has 12 degrees of freedom and cable-driven actuation in the serial chain from the eyes to the hand. The exploitation of real-time robot sensing is of paramount importance to increase the accuracy of the coordination, for example, to realize precise grasping and manipulation tasks. In this code paper, we propose an online and markerless solution to the eye-hand kinematic calibration of the iCub humanoid robot. We have implemented a sequential Monte Carlo algorithm estimating kinematic calibration parameters (joint offsets) which improve the eye-hand coordination based on the proprioception and vision sensing of the robot. We have shown the usefulness of the developed code and its accuracy on simulation and real-world scenarios. The code is written in C++ and CUDA, where we exploit the GPU to increase the speed of the method. The code is made available online along with a Dataset for testing purposes.


PRoPosed solutIon
In this code paper, we propose a markerless hand pose estimation software for the iCub humanoid robot [Metta et al. (2010)] along with an eye-hand kinematic calibration. We exploit the 3D CAD model of the robot embedded in a game engine, which works as the robot's internal model. This tool is used to generate multiple hypotheses of the hand pose and compare them with the real visual perception. By using the information extracted from the robot motor encoders, we generate hypotheses of the hand pose and its appearance in the cameras, that are combined with the actual appearance of the hand in the real images, using particle filtering, a sequential Monte Carlo method. The best hypothesis of the 6D hand pose is used to estimate the corrective terms (joint offsets) to update the robot kinematic model. The visual based estimation of the hand pose is used as an input, together with the proprioception, to continuously calibrate (i.e., update) the robot internal model. At the same time, the internal model is used to provide better hypotheses for the hand position in the camera images, therefore enhancing the robot perception. The two processes help each other, and the final outcome is that we can keep the internal model calibrated and obtain a good estimation of the hand pose, without using specialized visual markers on the hand.
The original research work [Vicente et al. (2016a) and Vicente et al. (2016b)] contains: (1) a complete motivation from the developmental psychology point of view and theoretical details of the estimation process, and (2) technical details on the interoperability between the several libraries and the GPGPU approach for an increased boost on the method speed, respectively.
The present manuscript is a companion and complementary code paper of the method presented in Vicente et al. (2016a). We will not describe with full details the theoretical perspective of our work, instead we will focus on the resulting software system connecting the code with the solution proposed in Vicente et al., 2016b. Moreover, the objective of this publication is to give a hands-on perspective on the implemented software which could be used and extended by the research community.
The source code is available at the official GitHub code repository: https:// github. com/ vicentepedro/ Online-Body-Schema-Adaptation and the documentation on the Online Documentation page: 1 For a more detailed review of the state of the art, please check the article Vicente et al. (2016a) http:// vicentepedro. github. com/ Online-Body-Schema-Adaptation We use a Sequential Monte Carlo parameter estimation method to estimate the calibration error β in the 7D robot's joint space corresponding to the kinematic chain going from each eye to the end-effector. Let us consider: where θ r are the real angles; θ are the measured angles; β are joint offsets representing calibration errors. Given an estimate of the joint offsets ( β ), a better end-effector's pose can be retrieved using the forward kinematics.
One of the proposed solutions for using Sequential Monte Carlo methods for parameter estimation 2 (i.e., the parameters β in our problem), is to introduce an artificial dynamics, changing from a static transition model ) to a slowly time-varying one: where w t is an artificial dynamic noise that decreases when t increases.

softWaRe desIgn and aRCHIteCtuRe PRInCIPles
The software design and architecture for implementing the eye-hand kinematic calibration solution has the following requirements: (1) the software should be able to run in real-time since the objective is to calibrate the robot during a normal operating behaviour, and (2) it should be possible to run the algorithm in a distributed way, i.e., run parts of the algorithm in several computers in order to increase computation power. The authors decided to implement the code in C++ in order to cope with the real-time constraint, and to exploit the YARP middleware [Metta et al. (2006)] to distribute the components of the algorithm in more than one machine.
The source code for these modules are available at the official GitHub code repository (check section 2).
The code is divided into three logical components: (1) the hand pose estimation (section 4.1), (2) the Robot's Internal Model generator (section 4.2), and (3) The software architecture implementing the proposed eye-hand calibration solution can be seen in Figure 1. versions of the internal model on the repository. One for the righthand (rightA) and another one for the left-hand (leftA). Our approach was to divide the two internal models since we have separated calibration parameters for the head-left-arm and for the head-righthand kinematic chains. The Unity platform was chosen to develop the internal model of the robot since it is able to generate a high number of frames per second on the GPU even for complex graphics models. The scripting component of the Unity game engine was programmed in C#. The bindings of YARP for C# were used in order to facilitate the internal model generator to communicate with the other components of the system. This component is OS-dependent and only runs on Windows and the build version available on the repository does not require a paid license of Unity Pro.
Finally, the likelihood assessment is called inside the Robot's Internal Model as a Dynamic Link Library and exploits GPGPU programming to compare the real perception with the multiple generated hypotheses. The GPGPU programming, using the CUDA library [Nickolls et al. (2008)], allows the algorithm to run in quasireal-time. The .cpp file contains the likelihood computation method, and the .cu the GPGPU program.
Our eye-hand calibration solution exploits vision sensing to reduce the error between the perception and the simulated hypotheses, the OpenCV library [Bradski (2000)] with CUDA enabled capabilities [Nickolls et al. (2008)] was chosen to exploit computer vision algorithms and run them in real-time.
The interoperability between the OpenCV, CUDA and OpenGL libraries was studied in Vicente et al. (2016b). In the particular case of the iCub humanoid robot [Metta et al. (2010)], and to suit within the YARP and iCub architectures, we encapsulated part of the code in an RFModule 3 class structure and use YARP buffered ports 4 and RPC services 5 for communications and user interface 3 http://www.yarp.it/classyarp_1_1os_1_1RFModule.html 4 http://www.yarp.it/classyarp_1_1os_1_1BufferedPort.html 5 http://www.yarp.it/classyarp_1_1os_1_1RpcServer.html (Check section 5.2.3). The hand pose estimation module allows the user to send requests to the algorithm which follows an eventdriven architecture: where for each new incoming information from the robot (cameras and encoders) a new iteration of the Sequential Monte Carlo parameter estimation is performed.

Initializing the Sequential Monte Carlo parameter estimation -initSMC Function
In the function initSMC we initialize the variables of the Sequential Monte Carlo parameter estimation, i.e., the initial distribution p(β 0 ) [Eq. (10) in Vicente et al. (2016a)], and the initial artificial dynamic noise. The Listings 1 contains the initSMC function where some of the variables (in red) are parametrized at initialization time (check sub-section 5.2.1 for more details on the initialization parameters). We use a random seed generated according with the current time and initialize each angular offset with a Gaussian distribution: N(initialMean; initialStdDev).

Read Image, Read Encoders, ProcessImages and SendData
The left and right images along with the head and arm encoders are read at the same time to ensure consistency between the several sensors.
The reading and processing procedure of the images are defined inside the function: handPoseEstimationModule::updateModule() that can be found on the file: src/ hand Pose Esti mati onModule. cpp.
The function process Images (see Listings 2) applies a Canny edge detector and a distance transform to both images separately. Moreover, the left and the right processed images are merged, i.e., concatenated horizontally, in order to be compared to the generated hypotheses inside the Robot's internal model.
The Hand pose estimation module sends: (1) the pre-processed images, (2) the head encoders and (3) the arm encoders (θ) along with the offsets (β) to the Robot's internal model 6 . This procedure is defined inside the function: handPoseEstimationModule::runSMCIteration()

Update Likelihood
The Hand Pose Estimation module receives the likelihood vector from the Robot's internal model and updates the likelihood value for each particle on the for-loop at line: hand Pose Esti mati onModule. cpp# L225 6 See Eq. 1 and handPoseEstimationModule.cpp#L214 fIguRe 1 | Architecture of the software. The hand pose estimation component (handPoseEstimation) initiates the Sequential Monte Carlo parameter estimation method (initSMC) and waits for a start command from the user. The perception and proprioception (cameras and encoders) of the robot are received and the parameter estimation starts. The real image and the particles are sent (sendData) to the Robot's internal Model (icub-internalmodel-rightA-cam-Lisbon.exe or icub-internalmodel-leftA-cam-Lisbon.exe) in order to generate the hypotheses. The likelihood assessment of each hypothesis is calculated using a Dynamic Link Library (DLL) file inside the Robot's internal model. The likelihood of each particle is saved and a Kernel Density estimation is performed to calculate the best calibration parameters. The Resampling step is performed and a new set of particles are saved for the next iteration of the Sequential Monte Carlo parameters estimation.

Kernel Density Estimation
Although the state is represented at each time step as a distribution approximated by the weighted particles, our best guess for the angular offsets can be computed using a Kernel Density Estimation (KDE) to smooth the weight of the particles according to the information of neighbor particles, and choose the particle with the highest smoothed weight (ωʹ [i] ) as our state estimate [Section 3.5 of Vicente et al. (2016a)].
The implementation of the KDE with a Gaussian kernel can be seen in Listings 3. The double for-loop implements the KDE accessing each particle (iParticle) and computing the influence of each neighbor (mParticle) according to the relative distance in the 7D-space between the two particles and the likelihood of the neighbor [cvmGet (particles,7,mParticle)]. The parameters that can be fine-tuned are highlighted in red.

Best Hypothesis
The best hypothesis, computed using the KDE, is sent through a YARP buffered port from the module after N iterations. The port has the following name:

/hpe/bestOffsets:o
The parameter N (the number of elapsed iterations before sending the estimated angular offsets) can be changed by the user at initialization using the minIteration parameter (check Section 5.2.1 for more details) and the objective is to ensure the filter convergence before using the estimate (e.g., to control the robot). This is an important parameter since in the initial stages the estimation can jump a lot from an iteration to the next one (before converging to a more stable solution).

Update Artificial Noise, Resampling and New Particles
The artificial noise is updated according to the maximum likelihood criteria. See the pseudo-code on Listings 4, which corresponds to line 230 to 254 in the file:

src/ hand Pose Esti mati onModule. cpp
We update the artificial noise according to the maximum likelihood, i.e., if the maximum likelihood is below a certain threshold (minimumLikelihood), we do not perform the resampling step and we increase the artificial noise. On the other hand, if the maximum likelihood is greater than the threshold we apply the resampling and decrease the artificial noise. The objective is to prevent the particles to become trapped in a "local maximum" since the current best solution is not worthy of resampling the particles. Indeed, this approach will force them to explore the state space.
The trade-off between exploration and exploitation is measured according to the maximum likelihood in each time step of the algorithm. The idea is to exploit the low number of particles in a clever way. Moreover, the upper and lower bound ensure, respectively, that: (1) the noise will not  (0)); // make sure random numbers are really random. 6. rngState = cvRNG(rand()); 7.
cvRandArr ( increase asymptotically and the samples will be spread over the 7D state-space and (2) the particles will not end-up all at the same value, which can happen when the random noise is Zero. On the resampling stage, we use the systematic resampling strategy [check Hol et al. (2006)], which ensures that a particle with a weight greater than 1/M is always resampled, where M is the number of particles.

Robot's Internal Model generator
The Listings 5 shows the general architecture of the Robot's Internal Model Generator using pseudo-code.

Initialization of the Render Textures
The render textures, which will be used to render the two camera images, are initialized for each particle for both left and right views of the scene.

Generate Hypotheses
The hypotheses are generated on a frame-based approach, i.e., we generate one hypothesis for each frame of the "game". After we receive the vector with the 200 hypotheses to generate, we virtually move the robot to each of the configurations to be tested and record both images (left and right) in a renderTexture.
After the 200 generations, we call the likelihood assessment DLL function to perform the comparison between the real images and the generated hypotheses.
The available version of the Robot's internal model generator is an executable compiled and self-contained which works on Windowsbased computers with the installed dependencies 7 . Moreover, this 7 The list of dependencies can be seen on Section 5.1.2 does not require neither the Unity ® Editor to be installed in the computer nor the Unity Pro license.
More details on the creation of the Unity ® iCub Simulator for this project can be found in Vicente et al. (2016b) Sec. 5.2 -"The Unity ® iCub Simulator".

likelihood assessment Module
The likelihood assessment is based on the observation model defined in Vicente et al. (2016a) Section 3.4.2.
We exploit an edge-based extraction approach along with a distance transform algorithm computing the likelihood using the Chamfer matching distance [Borgefors and Bradski (1986)].
In our code, these quantities are computed in the GPU using the OpenCV and CUDA libraries, and the interoperability between these libraries and the OpenGL library. The solution adopted was to add the likelihood assessment as a cpp plugin called inside the internal model generator module. The likelihood. cpp file, particularly the function CudaEdgeLikelihood, is where the likelihood of each sample is computed. Part of the code of the likelihood function is shown and analysed in Listings 6. Up to the line 21 of the Listings 6, we exploit the interoperability between the libraries used (OpenGL, CUDA, OpenCV) and after line 21 we apply our likelihood metric using the functionality of the OpenCV library, where GgpuMat is the generated Image of the ith sample and GgpuMat_R is the real Distance Transform image. In line 35, the lambdaEdge is a parameter to tune the distance metric sensitivity, which is initialized at the value 25 in line 1 (corresponding to line 148 of the C++ file) 8 . When the generated image does not have edges (i.e., the hand is not visible by the cameras), we force the likelihood of this particle to be almost zero (line 37 and 39, respectively). The maximum likelihood (i.e., the value 1.0) is achieved when each entry of 8 Check Vicente et al. (2016a) Eq (21)  3.for (each iteration) // for each iteration of the SMC 4.{ 5. waitForInput ( ); // wait for input vector with particles to be generated 6. 7. for (each particle) { 8. moveTheInternalModel ( ) // Change the robot's configuration 9.
RenderAllucinatedImages ( ); // render left and right image on a render texture 10. nextFrame ( ); 11. } 12. // After 200 frames call DLL function 13. ComputeLikelihood (AllucinatedImages (200), RealImage) // Call the DLL function (CudaEdgeLikelihood) to compare the hypotheses with the real image. 14.} the result image is zero. This happen when every edge on the generated image match a zero distance on the distance transform image. The multiplication by 1,000 and the int cast in line 42 is used to send the likelihood as a int value (the inverse process is made in the internal model when it receives the likelihood vector) and it is one of the limitations of the current approach due to software limitations the authors could not send directly a double value between 0 and 1.

aPPlICatIon and utIlIty
The Markerless kinematic calibration can run during normal operations of the iCub robot. It will update the joint offsets according to the new incoming observations. Moreover, one can also stop the calibration and use the estimated offsets so far, however, to achieve a better accuracy in different poses of the end-effector the method should be kept running in an online fashion to perform a better adaptation of the parameters.
The details of the dependencies, installation and how to run the modules can be found at Online Documentation page (check Section 2).

Installation and dependencies
The dependencies of the proposed solution can be divided in two sets of libraries: (1) the libraries needed to run the handPoseEstimation module, and (2) the libraries needed to run the Robot's internal model and the likelihood Assessment.

Hand Pose Estimation Module
The handPoseEstimation depends on YARP library, which can be installed following the installation procedure of the official repository 9 . Moreover, it depends on the OpenCV library 10 .
We tested this module with the last release of YARP (i.e., June 15, 2017), version 2.3.70, with the OpenCV library V2.4.10 and V3.3 and the code works with both versions. The authors recommend the reader to follow the official installation guides for these libraries.
To install theses modules, one can just run CMake using the CMakeLists. txt on the folder: /modules/handPoseEstimation/

Robot's Internal Model Generator and Likelihood Assessment
The Robot's internal model and the likelihood assessment depend on YARP library for communication and on the OpenCV library with CUDA enabled computation (i.e., installing the CUDA toolkit) for image processing and GPGPU accelerated algorithms. A Windows machine should be used to install this module.
The tested version of the OpenCV library was V.2.4.10 with the CUDA toolkit 6.5. The C# bindings for the YARP middleware on a windows machine should be compiled. The details regarding the installations procedures can be found at the following URL: Frontiers in Robotics and AI | www. frontiersin. org Vicente et al. Markerless Kinematic Calibration http://www. yarp. it/ yarp_ swig. html# yarp_ swig_ windows.
The C# bindings will allow the internal model generator to communicate with the other modules.
The C# bindings will generate a DLL file that, along with the DLL generated from the likelihood assessment module, should be copied to the Plugins folder of the internal model generator. In the official compiled version of the repository this folder has the following path: internalmodel/icubinternalmodel-rightA-cam-Lisbon_Data/ Plugins/ The complete and step-by-step installation procedure can be seen in the Online Documentation page on the Installation section.

Running the Modules
The proposed method can run on a cluster of computers connected with the YARP middleware. The internal model generator should run on a computer with Windows Operating System and with CUDA capabilities. The step-by-step running procedure guide can be found on the Online Documentation page. The rest of the section is organized with a high level perspective of running the algorithm. The YARP connections required between the several components can be connected through the XML file under the app/scripts folder.

Running the Hand Pose Estimation and its parameters
The Hand Pose Estimation can be initialized using the yarpmanager or in a terminal running the command: where, <value> is the value for one of the parameters (<parameter_name>) defined in the itemize list below: • name: name of the module (default ="hpe") • arm: arm which the module should connect to. (default = right') • initialMean: mean for the initial distribution of the particles • minIteration: minimum number of iterations before sending the estimated offsets. The objective is to give time to the algorithm to converge, without this feature one can receive completely different offsets from iteration t to t + 1 during the filter convergence (default = 35) fIguRe 2 | Projection of the fingertips on the left camera on simulated robot experiments. The blue dot represents the end-effector projection (i.e., base of the middle finger), the red represents the index fingertip, the green the thumb fingertip, the dark yellow the middle fingertip and the soft yellow the ring and little fingertips. On the left image (a) is the canonical projection (i.e., with β = 0 ) and on the right image (B) the estimated offsets ( β ).

Running the Robot's Internal Model
The internal model generator should run on a terminal using the following command: icub-internalmodel-rightA-cam-Lisbon. exe -force-opengl The -force-opengl argument will force the robot's internal model to use the OpenGL library for rendering purposes, which is fundamental for the libraries interoperability.

User interface
The user can send commands to the Hand Pose estimation algorithm through the RPC port hpe/rpc:i. The RPC port acts like a service to the user where the algorithm can be started, stoped or paused/resumed. It is also possible to request the last joint offsets estimated by the algorithm. The thrift file (modules/ handPoseEstimation/ handPoseEstimation. thrift) contains the input and output of each RPC service function (i.e., start, stop, pause, resume, lastOffsets and quit). More details about these commands can be seen in the use procedure on the documentation. Moreover, after connecting to the RPC port (yarp rpc hpe/rpc:i), the user can type help to get the available commands. The module also replies the input and output parameters of a given command if the user type help FunctionName (e.g., help start).

exPeRIMents and exaMPles of use
The experiments performed with the proposed method on the iCub simulator, with ground truth data, have shown a good accuracy on the hand pose estimation, where artificial offsets were introduced in the seven joints of the arm. The results on the real robot have shown a significant reduction of the calibration error [Check Vicente et al. (2016a) Section 5 for more results in simulation (Section 5.1) and with the real iCub (Section 5.2)].
For the reader to be able to test the algorithm, the authors collected a simulated dataset (encoders of the head and arms, and the left and right images) which can be used to test the algorithm.
The simulation results of the present article were obtained running the above-stated code with the default parameters on the collected dataset.
The robot performed a babbling movement which consists in a random walk in each joint. The minimum and maximum values of the uniform distribution used to generate the babbling movement starts at [-5, 5]°, and is reduced during the movement to [-0.5, 0.5]°, respectively. Despite a great amount of errors in the robot's kinematic chain, the algorithm was able to converge to the solution in Figure 2. Moreover, the cluttered environment on the background did not influence the filter convergence. The reader can see the projection of the fingertips on the left camera image: (1) according to the canonical representation on Figure 2A (where it is assumed an error-free kinematic structure, i.e., with β = 0 and (2) the corrected kinematic structure using the algorithm implemented and documented in this code paper on Figure 2B.
The convergence of the algorithm along with a side-by-side comparison with the canonical solution can be seen in the following video: https:// youtu. be/ 0tzLFqZLbxc On the real robot, we already performed several experiments in previous works, with different initial and final poses using the 320 × 240 cameras. In Figure 3 one can see one example of the hand estimation. While the image on the left (Figure 3A) shows the canonical estimation of the hand projected on the left camera image according to the non-calibrated kinematic chain, the image on the right (Figure 3B) shows the corrected kinematic chain which originates a better estimation of the hand pose. The rendering of the estimated hand pose was done taking into account the joint offsets on the kinematic chain before computing the hand pose in the image reference frame. 11 https://github.com/vicentepedro/eyeHandCalibrationDataset-Sim fIguRe 3 | Projection of the fingertips on left camera in real robot experiments. On the left image (a) the canonical projection (i.e., with β = 0 ) is shown, and on the right (B) the projection according with the corrected kinematic chain using the estimated offsets ( β ).