Automatic self-contained calibration of an industrial dual-arm robot with cameras using self-contact, planar constraints, and self-observation

We present a robot kinematic calibration method that combines complementary calibration approaches: self-contact, planar constraints, and self-observation. We analyze the estimation of the end effector parameters, joint offsets of the manipulators, and calibration of the complete kinematic chain (DH parameters). The results are compared with ground truth measurements provided by a laser tracker. Our main findings are: (1) When applying the complementary calibration approaches in isolation, the self-contact approach yields the best and most stable results. (2) All combinations of more than one approach were always superior to using any single approach in terms of calibration errors and the observability of the estimated parameters. Combining more approaches delivers robot parameters that better generalize to the workspace parts not used for the calibration. (3) Sequential calibration, i.e. calibrating cameras first and then robot kinematics, is more effective than simultaneous calibration of all parameters. In real experiments, we employ two industrial manipulators mounted on a common base. The manipulators are equipped with force/torque sensors at their wrists, with two cameras attached to the robot base, and with special end effectors with fiducial markers. We collect a new comprehensive dataset for robot kinematic calibration and make it publicly available. The dataset and its analysis provide quantitative and qualitative insights that go beyond the specific manipulators used in this work and apply to self-contained robot kinematic calibration in general.


Introduction
Accurate calibration is essential for the performance of every robot. Traditional calibration procedures involve some form of external measuring apparatus and become impractical if the robot itself or the site where it is deployed change frequentlythe current trend in automation with the shift from mass to small batch production. Furthermore, collaborative and social robots often employ cheaper and more elastic materials, making them less accurate and prone to frequent re-calibration. At the same time, advances in sensor technology make affordable but increasingly accurate devices such as standard or RGB-D cameras, tactile, force, or inertial sensors available; often, robots come already equipped with a selection of them. These factors together constitute the need and the opportunity to perform an automated self-contained calibration relying on redundant information in these sensory streams originating from the robot platform itself.
We present a robot kinematic calibration method, which combines complementary calibration approaches: self-contact, planar constraints, and self-observation. There are two main approaches to robot calibration [1]: (i) open-loop approaches, where the manipulator is not in mechanical (physical) contact with the environment and an external metrology system is used to measure robot pose components, and (ii) closed-loop approaches, where physical constraints on robot pose components substitute for external measurements. However, in their standard formulations, both approaches require a setup that is external to the robot itself. This work aims to extend these wellestablished tools to automatic self-contained robot calibration [2] to include camera reprojection errors, constraints arising from robot self-contact, or simultaneous calibration of multiple kinematic chains.
In this work, we employ two industrial manipulators mounted on a common base with force/torque sensors at their wrists, two cameras overlooking the workspace, and special end effectors with fiducial markers. Using this setup, we study different kinematic calibration methods: self-contact, planar constraints, and self-observation. These can be employed separately or in combination (simultaneously or sequentially). Fig. 1 shows an example: a self-contact configuration provides constraints that can be exploited to calibrate the kinematic chain of one or both arms. At the same time, the end effectors are observed by the cameras, providing additional data (equations) for calibrating the manipulators' kinematics and the camera orientation parameters.
We evaluate the performance of individual approaches on the calibration of: (i) parameters of the custom end effector, (ii) joint offsets of the complete kinematic chain of one arm, and (iii) the full Denavit-Hartenberg (DH) representation of the platform. Calibration using an external measurement device (laser tracker) is performed and serves together with the nominal manipulators' kinematic parameters as a reference. The calibration performance of individual methods is compared on an independent testing dataset which covers a significant portion of the robot workspace.
We present four main contributions. First, we provide a thorough experimental comparison of geometric kinematic calibration using self-contained approaches-self-contact, planar constraints, and self-observation-with kinematic calibration based on an external laser tracker. We demonstrate the viability of the self-contained approaches. Second, we design ways how the individual methods can be combined into a single cost function. The merits of such a synergistic approach leading to better identifiability of the calibrated parameters and generalization to the part of the robot workspace, which was not used for parameter calibration, are empirically evaluated. Third, we compare simultaneous (all parameters at once) with sequential calibration. Fourth, we collect a new comprehensive dataset for robot kinematic calibration and make it publicly available. This dataset and its analysis provide quantitative and qualitative insights that go beyond the specific manipulators used in this work and apply to self-contained robot kinematic calibration in general.
To our knowledge, this work is the first that bridges the realm of very accurate industrial calibration using external metrology with compact approaches relying on sensors on the robottypical for humanoid or social robotics. The innovation this work brings is that it expands the portfolio of traditional calibra-tion methods by self-contained approaches such as self-contact or self-observation. While traditionally industrial robots did not have means to detect contacts, this is changing now with the advent of collaborative robots where sensitive contact detection is necessary for safe human-robot collaboration. Furthermore, torque sensing within the robot structure or joint/torque sensors at the flange facilitate dexterous manipulation and interaction with the environment. At the same time, integrated vision sensors (cameras, RGB-D sensors) are becoming popular. Here we provide a framework that makes it possible to exploit all available calibration methods and sensory streams simultaneously and combine them in an optimal way, including weighting by their respective accuracy.
This article is structured as follows. Related work is reviewed in Sec. 2. Section 3 describes Experimental setup and data acquisition: Robot (Sec. 3.1) and Laser tracker (Sec. 3.2) setup, robot control using force feedback (Sec. 3.3), robot and end effector dimensions (Sec. 3.4), camera calibration (Sec. 3.5), and description of individual acquired datasets (Sec. 3.6). The optimization problem formulation for different combinations of kinematic chains is described in Sec. 4. Experimental results for end effector, manipulator offsets, and all DH parameters calibration (including laser tracker reference) are presented in Sec. 5. Finally, we present the conclusion (Sec. 6) and discussion, including future work (Sec. 7). The accompanying video illustrating the experiments is available at https: //youtu.be/LMwINqA1t9w; the dataset is available from [3].
Cameras are used less often: for example, stereo cameras combined with a calibration sphere [9] or a camera together with a laser pointer at the end effector [10]. All of these systems require specialized external equipment. Our platform was previously calibrated using two different methods: (1) Redundant parallel Calibration and measuring Machine (RedCaM) by Beneš et al. [11], Volech et al. [12], and (2) Leica laser tracker. Petrík and Smutný [13] reviewed the precision of these methods using a linear probe sensor.
Since we aim at automatic self-contained multisensorial calibration-that is, calibration using sensors on the robot, involving multiple sensory modalities like vision and touch-we next focus on reviewing the state of the art in calibration methods that do not rely on external metrology systems. We will also pay special attention to humanoid-like setups that offer the richest possibilities for self-contained calibration.
Calibration by self-observation. Cameras mounted on a robot can be used to calibrate the robot by closing the calibration loop through self-observation of its end effectors. The theory for this approach is laid out in [14] for a stereo camera system observing a robot arm. The manipulator's kinematics and extrinsic and intrinsic camera parameters are calibrated. Self-observation has been applied to humanoid robots viewing their hands with fiducial markers using online methods to calibrate kinematics re- lying on gradient descent by Hersch et al. [15] and recursive least squares estimation by Martinez-Cantin et al. [16]. Fiducial markers can sometimes be avoided when the robot wrist, hand, or fingertip are identified directly in the image [17,18]. The work of Birbach et al. [2] on the humanoid Agile Justin will be discussed in more detail below. Calibration using physical constraints. The next family of approaches exploits physical contacts of the end effector with the environment, such as fixing the end effector to the ground [19] or using more complex setups [20][21][22]. Some form of force sensing on the part of the manipulator is required. Kinematic calibration using plane constraints (with known or unknown parameters of the plane) was explored by Ikits and Hollerbach [23]; they proposed a new approach focusing on the proper definition of the base and end link frames and evaluated primarily in simulation. Zhuang et al. [24] explored multiple variants of plane constraints and the option with/without known plane parameters and demonstrated their results on a PUMA 560 robot. In particular, they showed that a single-plane constraint does not necessarily guarantee that all kinematic parameters of the robot will be observable. On the other hand, a multipleplane constraint should be a remedy to this problem. They show that the data collected from 3 planar constraints is equivalent to the data collected from a point measurement device provided that: 1) all three planes are mutually non-parallel; 2) the identification Jacobian of the unconstrained system is nonsingular; and 3) the measured points from each individual plane do not lie on a line on that plane. Joubair and Bonev [25] showed how multi-planar constraints can significantly improve the accuracy of calibration of a six-axis serial robot. Zenha et al. [26] had the simulated iCub humanoid touch three known planes, employing adaptive parameter estimation (Extended Kalman Filter) for kinematic calibration. Khusainov et al. [27] exploit a specific type of mechanical coupling as they fix the end effector of a manipulator to the legs of a humanoid robot that is being calibrated. A point and distance constraint can also be obtained from visual sensing [28].
Calibration by self-contact (self-touch). Self-contact constitutes a specific, less common way of kinematic loop closure available only to humanoid-like or dual arm setups. Additionally, corresponding sensory and motor equipment such that this self-contact can be performed in a controlled manner is needed. One possibility is to utilize artificial electronic skins covering specific areas or complete robot bodies (see [29,30] for recent overviews). A tactile array may be used for contact detection. If accurate spatial calibration of the skin is available, then additional components of the self-contact configuration-where contact occurs on each of the two intersecting chains-can be measured. Roncone et al. [31] performed kinematic calibration on the iCub using autonomous self-touch-index finger on the contralateral forearm; Li et al. [32] employed a dual KUKA arm setup with a sensorized "finger" and a tactile array on the other manipulator. Mittendorfer and Cheng [33] also exploit artificial skin to learn models of robots, but their methods primarily utilize the signals from accelerometers embedded in their multimodal skin. Alternatively, if controlled self-contact can be established but the exact position is not measured-typically when using force/torque sensing-such constraints can also be employed for calibration, as will be demonstrated in this work.
Self-contained multisensorial calibration. There are only a few approaches that exploit "multisensorial" (or "multimodal") input for self-contained calibration. Birbach et al. [2] calibrated the humanoid robot Justin observing its wrist. Sensors were fused by minimizing a single cost function that aggregates the errors obtained by comparing the discrepancies between simulated projections (left and right camera images, Kinect image, Kinect disparity) and the wrist position from forward kinematics. An inertial term from an IMU in the head was also considered. It is claimed that while pair-wise calibration can lead to inconsistencies, calibrating everything together in a "mutually supportive way" is the most efficient. Limoyo et al. [34] used contact constraints from sliding on a surface together with RGB-D camera information to formulate a selfcalibration problem for a mobile manipulator to estimate camera extrinsic parameters and manipulator joint angle biases. The former part is also experimentally verified. Stepanova et al. [35] systematically studied on the simulated iCub humanoid robot how self-observation, self-contact, and their combination can be used for self-calibration and evaluated the relative performance of these approaches by varying the number of poses, initial parameter perturbation and measurement noise. They found that employing multiple kinematic chains ("self-observation" and "self-touch") is superior in terms of optimization results as well as observability.
Largely orthogonal to the calibration types mentioned above, observability and identifiability of the system and speed of optimization convergence can be improved by i) combination of geometric and parametric approaches to kinematic identification [36], (ii) improving the error model by incorporating impact of strain wave gearing errors [37] and unifying various error sources [38], or iii) improving the optimization method [39].
In summary, existing works on self-contained automatic calibration have typically focused on a single approach-relying on self-observation, physical constraints, or self-contact. Birbach et al. [2] combined multiple sensors. However, first, essentially only "self-observation" chains closed through different vision-like sensors in the robot head were used. Second, only the case where all chains were combined using a single cost function was considered. In [34], self-observation and contact information are combined, but the results have a proof-ofconcept character. Our work is inspired by the simulation study of Stepanova et al. [35] but presents a completely new setting and results on a different platform. Using a dual-arm industrial robot with force/torque sensing and cameras, we present several formulations of the optimization problem for geometric kinematic calibration and empirical verification using (i) selfcontact, (ii) planar constraints, and (iii) self-observation. The pros and cons of the individual methods and their synergies are assessed and compared to an independent calibration using an industrial quality laser tracker.

Experimental setup and data acquisition
In this section, we introduce the experimental setup: the robot platform and its control, dimensions, and camera calibration. Then we present the process of data acquisition and the structure of the collected datasets.

Robot setup description
For our experiments, we used a robotic platform developed in the CloPeMa project [40] - Fig. 1 and Fig. 2. It consisted of two industrial manipulators Yaskawa-Motoman MA1400 installed on top of a Yaskawa R750 robotic turntable, which allows rotation of the two manipulators around the vertical axis, a control unit, and two computers connected to a local network. Two Nikon D3100 cameras with Nikkor 18-55 AF-S DX VR lens were mounted side-by-side on the turntable over the robot base, moving along with the turntable.  Table 1. Manipulators were equipped with ATI Industrial Automation Mini45 6-axis force/torque (F/T) sensors, placed between the last link of the manipulator and the end effector. All the different parts of the robot system were integrated into and operated with Robot Operating System (ROS) [41]. MoveIt! planning framework was used to control the robot.
We further equipped the robot with custom-made end effectors, one on each robot arm, that can be used to achieve selfcontact and, at the same time, visual self-observation by the two cameras. To this end, we designed and 3D printed icosahedron shapes (Figure 3 (left)) that have 20 flat faces, where fiducial markers can be placed, interleaved with 10 spherical surface regions that can be used for self-contact. The end effectors (further also referred to as icosahedron) were then attached to a steel rod. The collision model (Fig. 3, center) for the end effector was added to the robot model. For calibration employing self-contact or contact with a plane, the end effectors are treated as spheres (see Fig. 3, center). The spherical surfaces with corresponding reference frames are shown in Fig. 3

, right.
For visual self-observation, having 20 evenly spaced ArUco markers should ensure that at least 3 of them are always seen by each camera unless another part of the robot occludes the view. We used the OpenCV ArUco module to detect the markers in images.

Laser tracker setup
For acquiring reference parameter values and a baseline for self-contained calibration, we used an external "laser tracker", i.e. Leica Geosystems Absolute Tracker AT402, which is a portable 3D absolute tracker for fully guided measurement processes [42]. The tracker collects the 3D coordinates of points in the coordinate system of the tracker. The measurement has a resolution of 0.1 microns. The tracker was placed approximately 4 m from the manipulator. Thus, the typical U xyz uncertainty of the measurement was ±7.5 + 3 × 4 microns [42,43].
The retroreflector was attached to the robot approximately 25 cm from the last joint. During data collection, the distance from the laser tracker was between 2.5 and 4.5 m. The retroreflector and its collision model are shown in Figure 3 (left and center).

Robot control for contact configurations using force feedback
Driving the two manipulator arms into physical contact is necessary for exploiting self-contact in robot calibration. A point in the workspace was chosen where the end effectors should touch. Then, the configuration of each arm and the movement trajectory were obtained using MoveIt!. Each contact consists of three or four phases depending on the experiment. In the first phase, the robot right arm moves at a speed of 0.7m s −1 to a point close to the desired pose. In the self-contact experiment, this is followed by an analogous movement of the left arm. Then the right arm starts moving to an anticipated contact point (in fact, a small negative distance was used) with the left arm or a plane at a speed of 0.1m s −1 until the collision is detected by F/T sensors. The contact thresholds were determined empirically. Once the end effector gets into contact, the arm stops moving and cameras are triggered to take photographs. Joint angles are recorded. Afterwards, the right arm is slowly (0.1m s −1 ) moving to the former position (from the end of the first phase). Then the new target is selected and the whole process is repeated. For laser tracker experiments, the deceleration phase is skipped and the robot arm chosen moves directly at a speed of 0.7m s −1 in free space to predefined configurations which were sampled in a way that a significant part of the robot joint space is covered.

Robot and end effector dimensions and representation
A parametric model of the whole robot including the custommade end effectors was created. The robot's complete kinematic model was described by Denavit-Hartenberg (DH) notation, including cameras, end effectors, and laser tracker retroreflector.

Initialization of parameters.
To obtain the initial model, we used the nominal parameters of the robot provided by the manufacturer, previously acquired transformation matrices describing the mounting of the robot on the base, CAD model and manual measurements of the custom end effector and laser tracker retroreflector placement (the link EEL1/2 (7b) represents the transformation to the retroreflector from the last robot jointsee Fig. 2, Fig. 3, and Table 1), and manual measurement of the cameras' attachment. Using the Denavit-Hartenberg (DH) notation [44], Table 1 shows parameters of the manipulators, including the custom end effector or the retroreflector placement. Table 2 shows the two links to every camera.
The manipulator has 6 actuated rotational joints, denoted S , L, U, R, B and T (in italics), connected by links which are denoted S, L, U, R, B, and T in the order from the turntable to the end effector. Joint S connects the turntable with link S of the robot; joint L connects link S with link L, and so on -see Fig. 2. We added one last link to every manipulator's kinematic chain. These links represent the transformation to the end effector (icosahedron) (EE1, EE2) or retroreflector (EEL1, EEL2). Links between the origin and the turntables (TT1 and TT2) were added as well. The camera chain was also expressed using DH representation and is composed of 2 links: the link between the origin and the camera turntable (TT3, TT4) and the link to the camera entrance pupil (C1, C2). These last links (EE1, EE2, EEL1, EEL2, C1, C2) are not connected by an actual joint, and thus their joint angle was set to zero. These joints may still have a non-zero offset o though.
The transformation from the base frame to the first joint, i.e. to the turntable joint, is identity. For calibration using the laser tracker, we replaced the icosahedron end effector links with links ending in the retroreflectors (EEL1, EEL2) (see Fig. 2, Table 1 for robot DH parameters, and Table 2 for camera DH parameters). The left and right arm chains finish with the end effectors; the last frame is in the center of each icosahedron or in the laser tracker retroreflector. Note that since the mounting of the two manipulators on the turntable is not identical, the first 4-tuple of DH parameters is also not the same; we recognize two distinct turntable links in the arm chains and two more turntable links in the camera chains. All the chains have the same rotation of the first joint, but all other joints and parameters are independent.

Camera calibration
The calibration of the intrinsic parameters of the cameras was carried out with a dot pattern. The dataset used for calibration was composed of 22 pattern images. Each of the captured images had a different position and orientation w.r.t. cameras. Calibration of the camera matrix K and distortion coefficients vector d was performed using OpenCV camera calibration function calibrateCamera [45], and the following pinhole camera model extended with radial and tangential distortion coefficients: where [x c , y c , z c ] is a 3D point in camera frame, [u, v] The distortion coefficients are presented here in the order as they are returned from OpenCV camera calibration function calibrateCamera [45].

Data acquisition and description
An accompanying video illustrating the experiments is available at https://youtu.be/LMwINqA1t9w. There were 5 distinct datasets collected: (1) self-contact/self-touch experiment (both robot end effectors get into contact), (2) contact with a lower and upper horizontal plane, (3) contact with a vertical plane ("wall"), (4) repeatability measurement, and (5) laser tracker dataset. Datasets were acquired using a simple GUI applet [46]. Laser tracker data were also recorded for parts of self-contact and planar contact datasets (not used in this paper but included in the published dataset) [3].
Individual datasets. The whole dataset D whole is a set of individual datasets: where D st , D hp , D vp , and D lt are datasets collected during self-contact/self-touch experiment (st), contact with horizontal planes (hp), contact with a vertical plane (vp), and laser tracker experiments (lt), respectively. The world coordinate system is a right-handed Cartesian coordinate system with axes denoted by x, y, z. Units are meters; the origin is in the center of the robot base on the floor, the yaxis points behind the robot, and z-axis points up. Euler angles refer to the orientation of the end effector with respect to a coordinate system x , y , z attached to a moving body (end effector). In our notation, α, β, and γ represent the first, second, and third rotation, respectively. The rotation axis is indicated by the subscript (x, y, z and x , y , z denote the fixed and current axes, respectively).
Contact points of end effectors in datasets D st , D hp , and D vp were planned on grids in the manipulators' workspace, as shown in Fig. 4 and Fig. 5 for the self-contact and contact with planes, respectively. Several end effector orientations were tested for every position. A detailed description of individual datasets with contact points and orientations is provided in Table 3. Every experiment was repeated twice; a few configurations could not be reached due to robot motion planner failure. In all cases, photographs of one or both icosahedrons in every pose were taken and added to the dataset. In addition, for every setup (self-touch, horizontal and vertical plane), 20 repetitions  in 2 different positions were performed to evaluate the measurements' repeatability (Repeatability measurement dataset). For kinematic calibration, the distribution of robot joint angles is important. It is visualized in Fig. 6 for the different experiments.  x I,r and x I,l , respectively (computed from forward kinematics); the joint configuration of the robot θ = {θ ra 1 , ..., θ ra N , θ la 1 , ..., θ la N } (ra and la denoting right and left arm, respectively); coordinates of every marker (K being the number of markers) in each of the cameras u = {u rc 1 , ..., u rc K , u lc 1 , ..., u lc K } (rc and lc denoting right camera and left camera, respectively); position of laser tracker ball retroreflector in the tracker coordinate system including uncertainties U95 [42] of position detection and the tracker measurement timestamp L = {x L , u L , t L }. We also saved for reference the magnitude of force measured by both force sensors before (F b ) and during (F a ) the contact for each arm F = {F ra b , F ra a , F la b , F la a }; and the names of saved camera images n r , n l . Individual dataset points are organized in a matrix, where each line i of the matrix corresponds to the individual dataset point: The whole dataset D whole contains M = 1268 logged poses with 23022 marker reprojections in total (12371 from the right camera and 10651 from the left camera), which makes approx.   25 marker reprojection per pose for self-contact experiment and 11 marker reprojections per pose for planar constraints. There are also 586 poses acquired by the laser tracker-this dataset does not contain marker reprojections. The reprojections are sorted from the lowest to the highest marker ID for every camera image. If a marker was not found in the image, its coordinates are denoted as (NaN, NaN). Similarly, if data were not measured or captured by Leica tracker, all corresponding values are filled with NaN.
For optimization, we transformed the dataset so that one line would relate to one data point. That is, for a single robot con- The dataset and its description can be downloaded from [3]. For the positions including LEICA measurements, the appropriate csv file with (x,y,z) positions detected by the laser tracker scanner are available.

Multi-chain robot calibration
In multi-chain robot calibration we estimate parameter vector ., n} is a set of indices identifying individual links; a k , d k and α k are the first three parameters of the DH formulation of link k; o k is the offset that specifies the positioning of the encoders on the joints with respect to the DH representation. This is sometimes referred to as static geometric calibration. We often estimate a subset of these parameters only, assuming that the others are known. This subset can for example consist of a subset of links N ⊂ N (e.g., only parameters of one arm are to be calibrated) or a subset of the link parameters.
Here we focus on offsets in the revolute joints o (sometimes dubbed "daily calibration" [47]).
Let D ⊂ D whole denote the set of robot configurations (dataset points) used for optimization: index of the used camera, u i = (u i , v i ) is the position of the marker center in the camera, and θ i is the current robot joint configuration (joint angles from joint encoders for the given robot configuration). Estimation of the parameter vector φ is done by optimizing a given objective function f (φ, D, ζ): where M is the number of robot configurations and corresponding end effector positions used for calibration (hereafter often referred to as "poses" for short), φ is a given parameter estimate, dataset point D i includes joint angles θ i from joint encoders for the given robot configuration, and constant vector ζ defines all other necessary parameters such as camera calibration, fixed transformations, fixed DH parameters, or other properties of the robot. For chains involving cameras, the function g(φ, D i , ζ) refers to reprojection error as described in the next section while for the chains including contact, it corresponds to the distance between a real (observed) end effector position p r i and an estimated end effector position p e i computed using forward kinematic function for a given parameter estimate φ and joint angles from joint encoders θ i . For the case of planar constraints, the shortest distance of end effector position (icosahedron center) to the parameterized plane is minimized.
We study different combinations of intersecting chains and their performance in calibrating one another. Specific form of the function g(φ, D i , ζ) for individual considered chains and their combinations is discussed in the following subsections. At the end of every subsection, we also state how many components of the pose can be measured using the different methods-similarly to the analysis for standard open-loop and closed-loop calibration approaches in [1].
For problem representation, optimization, and some visualization, we employed the multisensorial calibration toolbox [48].

Self-contact -two arms chain (LA-RA)
This corresponds to the self-contact scenario in which contact occurs directly between the end effectors of the Left Arm (LA) and Right Arm (RA). As described in Section 3.1, for contact, the end effectors can be treated as spheres and the contact can occur between any of the 10 spherical tiles. The newly established kinematic chain for the upper body includes both arms; the head and eyes are excluded (see Fig. 9). To optimize parameters describing this chain, we minimize the distance between estimated positions in the 3D space of left and right arm end effectors (see Fig. 10).
In this case, the parameter vector φ consists of the following parameters: φ = {φ ra , φ la }, where φ ra and φ la are DH parameters being calibrated corresponding to the robot right and left arm, respectively. The objective function to be optimized is:   Figure 10: Visualisation of self-contact scenario. When end effectors are in contact, they are separated by distance d. This can happen in multiple configurations. For simplicity, we visualize the case when only end effector link length is estimated for one arm (left) or both arms (right), with the additional assumption that the link length is the same for both arms.
where the function c(φ, D i , ζ) computes the distance of the end effector centers in the configuration given by the dataset point is a subset of the dataset points with a particular robot configuration (see Dataset structure in Sec. 3.6 for details). The distance of the end effector centers, marked as q(ζ), is equal to one icosahedron diameter 2r, because both end effectors have identical shape. For the icosahedron diameter, we took the value from CAD model of 2r = 116 mm and kept it fixed [46].
As can be seen, the objective function contains a set of constraints on the distances between the end effector positions. These constraints can be written for the case of self-contact as follows: Let (x r i , y r i , z r i ) and (x l i , y l i , z l i ), be the centre of right and left arm end effector computed from forward kinematics for data point i with the given joint configuration θ i , respectively. Then for each data point c(φ, D i , ζ) and right side to q(ζ) in Eq . 7): According to [1], this corresponds to restricting 1 position parameter. The parameters of the touched surface (2r) are in this case known-same as for planar constraints with known plane parameters. This corresponds to the scenario where the Left (LA) or Right (RA) robotic arm is getting into contact with a plane (see Fig. 11). In this type of optimization problem, we can distinguish formulations including single-plane or multiple-plane constraints [24,25]. The classical formulations of the problem use either a general equation of the constraint plane or plane normals [23]. The general equation of a plane is: where n = (a, b, c) is a plane normal vector. The parameters of the plane can be known in advance (as in [23], [26] or [20] where calibration cube is used), or unknown (as in our case or [24]). The planes and their corresponding parameters are visualized in Fig. 12.
When the parameters of the plane are unknown, the parameter vector φ consists of the following parameters: φ = {φ ra/la , n, d}, where φ ra/la are the DH parameters of the robot arm in contact, n is a plane normal, and d is the distance of the plane from the origin. We formulated the objective function as the distances between contacts and a single or multiple fitted planes: where  Figure 12: Visualisation of planes with their corresponding parameters (n hp , d hp and n vp , d vp ) and coordinate frames. End effector with radius r in contact with horizontal plane in multiple places is shown.
of parameters for lower horizontal plane (φ hp1 ), higher horizontal plane (φ hp2 ), and vertical plane (φ vp ), respectively. The vector c(φ j , D j , ζ) is a vector of distances between individual end effector positions and the given plane j for each datapoint D j i from the given dataset D j . The distance is computed using plane normals and corresponding plane coordinates as follows: c(φ j , D j i , ζ) = ||n j p j i (φ j,ra/la )+d||. Point p j i is the centre of the end effector computed by forward kinematics from dataset point D j i ; φ j,ra/la is the estimated parameter vector corresponding to the DH parameters of the touching arm; n j = [a b c] is the plane normal; d j is the distance of the plane from the origin. These plane parameters (n j and d j ) are estimated at each iteration of the optimization process based on current point coordinates estimates by SVD method as described below. The ζ wraps up all other necessary parameters.
To acquire parameters of the fitted plane in each iteration, the measured points are converted to homogeneous coordinates and their centre of gravity is computed and subtracted from all points. Afterwards, Matlab function SVD is called. The singular vector corresponding to the smallest singular value is set as a normal of the plane. Parameter d in Equation 9 is calculated from: where (x 0 , y 0 , z 0 ) are coordinates of the points center. Let n = [a, b, c] be a plane normal, d the distance of the plane from the origin, (x r i , y r i , z r i ) the centre of the right arm end effector computed from forward kinematics for data point D i with the given joint configuration θ i , and r the radius of the icosahedron. Then for each data point D i in the dataset D hp/vp we get the following constraint (left side of the equation corresponds to c(φ vp/hp , D vp/hp , ζ) in Eq. 10): As per [1], this type of calibration corresponds to the restriction of 1 position parameter. Compared to the self-contact setup, in the case of planes with unknown parameters (our case), new calibration parameters (parameters of the plane n and d) have to be added.
Author This corresponds to the scenario where we observe the Left Arm (LA) or Right Arm (RA) end effector with AruCo markers via Left Camera (LEye) or Right Camera (REye) (see Fig. 13). We calibrate: (i) extrinsic parameters of the cameras (in our case as DH links) while assuming the robot DH parameters to be known, or (ii) the whole kinematic chain of the robot arm simultaneously with camera extrinsic parameters. In this case, the optimization is done by minimizing the reprojection error between the observed AruCo markers' positions in the camera and the estimated position using the current estimated kinematic model. To obtain projected marker positions (which determine end effector position) in each of the robot cameras, we apply the calibrated camera model. The camera was precalibrated using a standard camera intrinsic calibration (see Section 3.5). The extrinsic parameters (in our case expressed in DH parameters) of the camera were precalibrated based on reprojections of AruCo markers using the fixed nominal parameters of the arm. First, we have to transform marker positions to the camera frame: where [x c , y c , z c , 1] T are homogeneous coordinates of marker position in the frame of the given camera and T camera m is a transformation from the marker M m to the given camera achieved through standard approach for forward kinematics using DH parameters. Afterwards, we apply a standard pinhole camera model extended with radial and tangential distortion coefficients and transform the 3D point in camera frame ([x c , y c , z c ]) into image coordinates [u, v] (2D plane of the camera) (see camera model in Sec. 3.5). The actual marker position means the center of the ArUco marker. The OpenCV function cali-brateCamera provides calibration with resolution of whole pixels. ArUco marker detection is done by the OpenCV function cv2.aruco.detectMarkers, which provides the coordinates of all four marker corners. From these, the center of the marker is calculated as an intersection of the diagonals connecting the marker corners. The error resulting from this assumption is smaller than the calibration error [46].
The parameter vector φ consists of the following parameters: φ = {φ ra/la , φ rc/lc }, where φ ra , φ la , φ rc , and φ lc are calibrated DH parameters corresponding to the right arm, left arm, right camera, and left camera, respectively. The objective function is formulated as the distance between projected markers and their pixel coordinates in the images: This approach does not require information from both eyes and enables us to estimate only one side of the robot body (e.g., parameters of the left arm and left camera). For example, the estimated parameter vector φ in the case of the kinematic chain connecting left arm and left camera consists of the following parameters: φ = {φ l , φ lc }, where φ l and φ lc are parameters corresponding to the robot left arm and to the left camera, respectively.
The conditions in the objective function can be expressed as follows: Let x m be a centre of origin of marker m i (obtained by forward kinematics for the given DH parameters estimate), x c be a marker position in the given camera frame, T EEF m i be a transformation from marker m i to the arm end effector (transforms to individual ArUco markers can be found at the dataset page at [3]) and T camera EEF transformation from the end effector to the given camera frame, p(x c ) a reprojection of 3D point in camera frame to image coordinates [u, v] (see Eq. 1): Then, for each data point i (corresponding to position of marker m i in the configuration θ i ), reprojection of marker m i to camera frame [u i , v i ], and each measured marker position in the camera image (u m i , v m i ) we get two equations: According to [1], this type of calibration corresponds to the restriction of 2 position parameters. Still, we have to add camera DH parameters (see Table 2) to parameters being calibrated to enable the reprojection of markers to the camera frame.  The overall objective function can be generally defined as (depending on which datasets and criteria we want to use for calibration): where D st , D p = { D hp1 , D hp2 , D vp } and D so = { D st , D p } are datasets for self-touch, planar constraints optimization, and self-observation, respectively. Parameters k st , k p , k so are scale factors to reflect the different uncertainty/reliability of the components, the number of measurements per configuration, and transformations from distance errors given in meters with the reprojection errors in pixels. Symbol marks a Hadamard product: i.e. (k st g st ) i = k st i · g st i . The value of these parameters is set independently for each pose: where η st , η p , η so reflect the reliability of the measurement (e.g., η i = σ − 1 2 i , where σ i is the uncertainty of the measurement in the given pose). In this work, η = 1 was used for all approaches. The parameter p i reflects the fact that there are multiple markers detected by cameras for the given contact configuration. Therefore, in the case of (planar constraints / self-touch) contact and self-observation combination p p = 10, p st = 20 (there are two icosahedrons in contact and on average 20 marker detections per contact event), and p so = 1. The coefficient µ i is determined from intrinsic parameters of cameras (60 deg horizontal view angle, image size 4000 × 6000px) and distance d i of the end effector from the camera: µ i,x = 4000px/(d i (π/3)), µ i,y = 6000px/(d i (π/3)). For simplicity, µ i,x = µ i,y was used. Figure 14 shows connections of different calibration chains and constraints (e.g., distance between end effectors during a self-contact or distance between end effector and a plane for a plane constraint).

Calibration using laser tracker
In this scenario, the robot arm with attached retroreflector is moving in free space to the configurations selected by sampling a joint space in the way that the retroreflector is facing the laser tracker (more details under Laser tracker experiment in Section 3.6). The distance between the position of the retroreflector acquired by the laser tracker and the position computed from current robot arm DH parameters (plus current joint angle values and using forward kinematics) is minimized.
The parameter vector φ consists of the following parameters: φ = {φ ra/la , R, T}, where φ ra/la are DH parameters corresponding to the right/left robot arm (with the link EEL1/EEL2 to the retroreflector -see Table 1), R and T are rotation and translation matrices defining laser tracker position w.r.t. the robot base frame.
The objective function is formulated as the error of distances: where the function p(φ, D i , ζ) = ||x L i − x r i || computes the distance of the transformed point x L i from laser tracker and the point x r i gained from forward kinematics and current estimate of robot DH parameters in the configuration given by the dataset point D i .
To calibrate the robot DH parameters, we minimize the distance between these 2 sets of 3D points (set X L contains points from laser tracker in its coordinate system and set X r includes points in the base coordinate system computed from a joint configuration using forward kinematics and the current estimate of robot DH parameters) using an iterative approach.
In each iteration of the optimization process we: 1. recompute the estimate of robot DH parameters 2. recompute rotation and translation matrix defining laser tracker position w.r.t. base frame. The relation between corresponding points in sets is generally: where R and T are rotation and translation matrices defining laser tracker position w.r.t. the robot base frame, N i is noise for the i-th datapoint. We used an algorithm introduced by Arun et al. [49] for finding least-squares solution of R and T. It is a non-iterative algorithm using the singular value decomposition of a 3×3 matrix.
This is a standard open-loop calibration method in which 3 components of the pose are measured: the 3D position, not orientation, is acquired from the laser tracking system [1]. The transform (R, T) relating the robot base frame to the laser tracker frame of reference has to be added to calibration. Since we express rotation by rotation vector, this corresponds to adding 6 parameters to calibration.

Non-linear least squares optimization
For solving the optimization problem, the Levenberg-Marquardt iterative algorithm was employed. This is a standard choice for kinematic calibration (e.g., [2,14,50]) and combines the advantages of gradient descent and Gauss-Newton algorithms, which can be also applied to this problem. For implementation, we used the Matlab Optimization Toolbox and the nonlinear least-squares solver lsqnonlin with the Levenberg-Marquardt option and parameters typicalx and scaleproblem.

Observability and identifiability of parameters
According to [51], the observability index measures the quality of the dataset based on the identification Jacobian matrix J, which represents the sensitivity of minimized values to the change of individual parameters. Borm and Menq [52] proposed a measure O 1 ; Driels and Pathre [53] proposed O 2 ; Nahvi and Hollerbach proposed measures O 3 [54] and O 4 [55]. All these measures can be computed from the singular value decomposition of J. They are defined as: where σ j is j-th singular number, m is the number of independent parameters to be identified and n is the number of calibration configurations.
The identification Jacobian matrix itself shows us the identifiability of individual optimized parameters: J(i, j) = ∂X i ∂φ j , where X i is a distance (Eq. 7 and Eq. 10) or a reprojection error (Eq. 13) and φ j is the parameter to be estimated. If a matrix column related to a parameter consists only of zeros, the parameter is unidentifiable which leads to an unobservable problem (the minimal singular number is zero). According to [1], an unidentifiable parameter means that the experimental setup does not allow it to be identified, not that it is intrinsically unidentifiable. The identifiability can be improved by adding additional sensors to the setup as well as by extending the amount of poses in the dataset. In our analysis, we compare observability indices O 1 (representing the volume of a hyperellipsoid specified by singular numbers) and O 4 (noise amplification index which measures both eccentricity of the hyperellipsoid through O 2 and size of the hyperellipsoid through O 3 ) (see [1] for an overview) for individual chains and estimated parameter vectors.

Perturbation of the initial parameters estimate
To evaluate the dependence of the optimization performance on the quality of the initial estimates of the parameters, we perturbed all estimated parameters by a perturbation factor p = {1, 3, 10} (in experimental section, we show results for p = 3). We perturbed all initial offset values o i as follows: where U(a, b) is uniform distribution between a and b. It is reasonable to expect that the remaining DH parameters (α, a, and d) will be in general more accurate as they can be extracted from CAD models and there is no moving part and no encoder involved. Therefore, their perturbation was chosen as follows:

Evaluation
Before optimization, we randomly divided the dataset into training and testing data with a ratio of approximately 70 to 30. Training and testing datasets contained distinct sets of posesmultiple observations of markers in a given pose were all added to the same batch. Optimization was based on the training data; testing data were used to evaluate the result. We used rootmean-square (RMS) error to compare the results from multiple optimization runs. It is computed as: where M is the number of observations/measurements, k c is a scale factor for the given calibration approach c (see Sec

Accuracy of measuring individual components
The accuracy of several components constitutes a lower bound on the overall accuracy. In particular, we estimated or experimentally evaluated the accuracy of the following (we note in brackets for which chains is the accuracy of the given component relevant): • 3D printed parts and their dimensions: 0.1mm error based on the printer specification; we assume the end effector rod to be straight. (self-contact (Sec. 4.1) and contact with a plane chains (Sec. 4.2)) • Intrinsic camera calibration: we evaluated error of intrinsic camera calibration using multiple calibration patterns split to training and testing dataset. The resulting error is 0.73 mean reprojection error for calibration points in pixels on the testing set. This error is composed of the accuracy of detecting the dot pattern and the calibration itself. of marker detection as distance of the detected markers from interpolated straight line (Fig. 15). The mean error is around 0.33 pixels for well visible markers (ID 107, 112, 113) and 0.63 pixels for other markers with maximal error of 1.5 pixels. We investigate the error along the line with similar results. Intrinsic camera calibration puts a lower bound on these results. (self-observation chains (Sec. 4.3)) • Repeatability of measurements: The details of robot movements and how they were stopped once contact was detected are described in Sec. 3.3. We used the laser tracker to measure the repeatability of these movements, using 20 repetitions of the same movement at two different positions sampled from the grid for planar constraints (Fig. 5). These involved only small robot movements (10 cm) and repeatability was high - Fig. 16. For self-touch, larger movements from the robot home position were executed. However, self-contact experiments involved both large movements and small movements similar to those for contact with a plane. Thus, the statistics for Fig. 16 was combined for the self-touch distribution, resulting in overall lower repeatability. Results in x-coordinate are shown. Similar distribution and range of errors was observed for y and z coordinate. (all chains) • Camera projection error propagation: Cameras measure projections to the image plane, producing higher uncertainty in the z-axis of the image coordinate system (image depth). The uncertainties in all three axes can be obtained by error propagation from object position uncertainty. The projection uncertainty can be written as a quadric equation u T Qu = 1, where Q is a matrix (4x4 identity matrix assumed here) and u is a 4x1 vector of both cameras' projection pixel coordinates. The projections vector u can be expressed as u = J x X, where J x is the Jacobian matrix computed from projection equations (Eq. 1) and X are the projected point coordinates. Combining these two equations, we obtain the equation for position uncertainty X T Q x X = 1, where Q x = J T x QJ x , which can be interpreted as an ellipsoid. Then, the eigenvalues of Q x are a −2 , b −2 , c −2 , where a, b, c are half the principal axes' length in meters. In our case, the mean values are a = 0.146 mm, b = 0.150 mm, c = 1.330 mm, i.e. an error of 1 pixel corresponds to more than 1 mm error in object position in one axis. The comparison of uncertainties can be seen in Fig. 16. The length of the last axis of the ellipsoid (image depth) is one order of magnitude higher than the other lengths and the touch uncertainty. Based on the above-mentioned analysis, we can see that the lower bound of error for self-observation will consist of the combination of intrinsic camera calibration error (e i ) and ArUco marker detection error (e ArUco ): e so = e i e ArUco . The measured displacement of the end effector between contacts is influenced by e ArUco and the observed error is systematic. In the worst case, we can estimate the error to be independent on the direction of the impact and for all directions consider the maximum observed error. As can be seen, repeatability measurements show that the accuracy is very high and the errors are below 0.03 mm. Finally, an analysis of camera errors in 3D space reveals their limited accuracy, in particular in one axis corresponding to the depth.

Experimental results
To evaluate and compare the results of kinematic parameters calibration across individual approaches and their combinations, we show results for the right arm of the robot onlythis kinematic chain can be calibrated using all the datasets collected. First, we show the results for end effector length and angle offset (Section 5.1) without and with camera calibration. Afterward, we show results of "daily calibration" (calibrating Each dataset (see Sec. 3.6 for details) was split into training and testing part (70:30). The resulting RMS errors (according to Eq. 21) for both training and testing datasets are shown and compared to the case where nominal parameters are used (before). We show resulting RMSE separately for self-contact distances between icosahedrons ('dist'), distances to plane ('plane dist'), and for camera reprojections in pixels ('mark.') (e.g., Fig. 17). In addition, we investigate corrections to the parameter values and their mean and variance.
Calibration methods relying on physical contact-selfcontact and planar constraints-can be employed independently or in combination with calibration using cameras (selfobservation). We distinguish three possibilities that will be used throughout the rest of the Experimental results section: • No cameras. Only contact-based calibration was employed and cameras were not used at all.
• Fixed cameras. Cameras' DH parameters were precalibrated (see Table 2) using the contact-based datasets (selfcontact and contact with plane) from marker reprojection errors, using nominal values of the robot kinematic parameters. Then, to calibrate the robot right arm, reprojection of markers to the camera frame is used together with contact information, while the camera extrinsics stay fixed.
• Camera position calibration (Cam. calib.). Cameras were precalibrated in the same way as in Fixed cameras. During robot kinematic calibration, reprojection errors are combined with contact-based constraints. Additionally, camera extrinsics (expressed in DH parameters) are subject to calibration as well.
Self-observation (S-O), i.e., information from camera reprojection errors, can also be employed independently. The datasets used are those featuring contact (D st , D hp , D whole , etc.), but the constraints arising from physical contact are not employed. Table 4 provides an overview which parameters can be subject to calibration under the different approaches. The parameters that can be calibrated using the contact-based approaches are marked in black. Fixed cameras enable calibration of the end effector orientation (o EE1 , in green; contact information alone does not provide any information about orientation -see Figures 10 and 12). Camera extrinsic parameters, which may be also subject to calibration, are shown in blue.

Calibration
Calibrated parameters

Calibration of end effector length and joint angle offset
First, we evaluated the ability to calibrate the length and joint angle offset of the last link (EE1/EE2) by individual approaches. The diameter of the final part of the custom end effector-assuming it is a sphere in this case as the contact is at the spherical tiles-is known. The parameter being calibrated in all cases is the end effector length d EE1 ; with fixed cameras, the orientation, o EE1 can be also calibrated. In the case of planar constraints, the plane parameters {n hp/vp , d hp/vp } are estimated in addition.
In the self-contact scenario, we compared the case where we assume that the length and offset for the left arm end effector are known and the case where we calibrate both EE1 and EE2 with the assumption that both end effectors have the same lengththis is necessary to avoid compensation of the length of one end effector by the other end effector. In Table 5, we show results for the case when both end effectors lengths are calibrated simultaneously. Table 5, the top part, summarizes the corrections to the nominal values of the length of the end effector achieved by individual calibration approaches. The most consistent estimates (mean corrections over 20 repetitions) across individual approaches are achieved with fixed precalibrated cameras, where the range of estimates is from 3.89 mm (all) to 4.84 mm (2 horizontal planes) and from 3.11 mm to 5.95 mm for the selfobservation approach trained on three different datasets.
The quality of the individual estimates can also be evaluated with respect to the standard deviation (s.d.) across 20 repetitions. For planar constraints without cameras, we get a very high standard deviation (around 4 mm) indicating poor estimates. On the contrary, using the self-contact approach even without cameras results in standard deviation around 0.05 mm, which is comparable to the standard deviation achieved when camera information is included (either precalibrated cameras or cameras calibrated together with end effector length). The addition of cameras also improves consistency of end effector length estimates by planar constraints (standard deviation drops from 4 to 0.2 mm for calibrated cameras and to 0.07 mm for fixed cameras). We also evaluated the calibration using only selfobservation information. When we used precalibrated cameras on the whole dataset, we achieve comparable results to other estimates. When we calibrate the cameras together with end ef-   fector length, the length of the end effector is overestimated and the standard deviation (based on the used training data) is also significantly higher. We compare orientation corrections using only selfobservation calibration (with cameras precalibrated on different subsets of datasets) or self-observation combined with other constraints (self-contact, planar constraints) -see Table 5, bottom part. The corrections to manually measured parameters of the orientation vary between 0 − 0.06 rad (0 − 3.4 • ).
The resulting RMS errors (see Section 4.9) for end effector calibration in mm and px are shown in Fig. 17. For all variants of the optimization problem, both mean dist RMSE and markers RMSE decreased compared to the nominal end effector value (Fig. 17). For the variant all and Fixed cameras, the distance RMSE decreased from 3.8 mm before calibration to 1.40 mm after calibration and markers RMSE decreased from 20.8 px before to 18.0 px. Without cameras, we achieved slightly better results for self-contact distance RMSE (1.34 mm), but in this case the end effector orientation is not calibrated-resulting in higher error in camera reprojection. For the case with calibrated cameras, the resulting RMSE are slightly smaller (mean self-contact distance 1.34 mm and camera reprojection RMSE 16.0 px) than for the case with fixed cameras. The calibration resulted in the estimation of end effector length and orientation-the corrections of these parameters are listed in the Table 5.
For the end effector parameters, we do not have any reference value apart from manual measurements (laser tracker cal-ibration cannot be applied due to the retroreflector placement -see Fig. 3). However, the corrections found using the whole dataset and the combination of all chains (self-touch + planar constraints + cameras) were reliable enough. Consistent adaptation was observed for the length. Hence, for subsequent calibration of the robot arm kinematics, we used the end effector length estimate of 35.4 cm. For the orientation, no significant adaptation was arrived at and the nominal value was kept for the remaining experiments. (see Table 1).
To validate the selection of the end effector length-which will be used in what follows as an initial value of this parameter-we performed two additional experiments. First, we systematically varied the initial end effector length parameter before calibration in the range from 10 to 70 mm and evaluated the RMS error after calibration on the self-contact datasetsee Fig. 18. Two solutions were found with minimal RMS error (35.38 cm and 61.4 cm), but one of them is not to be considered, as we do not expect errors bigger than 1 cm from manual measurement (35 cm). These two solutions can be explained by the nature of the self-contact as there are multiple possibilities arising from the geometrical consideration of self-contact (see Fig. 10, right).
Second, the sensitivity to perturbation of the initial end effector length for individual approaches was evaluated in the case with/without cameras and compared to the case without perturbation. The resulting corrections can be seen in Fig. 19. For the case with perturbation and without cameras (centre), two solutions were found by self-touch calibration, corresponding to Fig. 17. When fixed precalibrated cameras are added, we achieve results with a low standard deviation for all calibration setups; for self-touch calibration, only one solution is selected. For self-touch and all conditions with fixed cameras, we achieve the same results both for perturbed initial value and non-perturbed.

Calibration of robot joint offsets by an external measurement device (reference)
The whole dataset D tracker collected by the laser tracker (Leica) was used for right arm joint angle offsets calibration to create a reference value to our other calibration approaches (see Sec. 4.5). Parameters subject to calibration are listed in Tab. 4. First, we estimated the tracker position (R, T ) and retroreflector DH parameters (a, d, and joint angle offset o) (link 7b in Table 1). Afterwards, we calibrated all offsets of the robot arm including the retroreflector DH parameters.
As can be seen in Fig. 20, left, RMSE improved with further calibration: from 6.08 mm before calibration to 3.03 mm after retroreflector calibration and to 2.64 mm after all offsets calibration. To achieve the best possible solution quality, we have not split the original dataset into training and testing part in this case. Instead, we evaluated the RMSE on a separate dataset (self-contact dataset D st ) using the parameters estimated by the laser tracker calibration and compared to RMSE for nominal parameters (Fig. 20, right). In this case, we used same parameters of the end effector for both methods and compared the error at the end of the 5th link (see Fig. 20   Fixed cameras Perturbation p=5 Figure 19: Corrections of end effector length compared to nominal values: (left) no perturbation + no cameras, (centre) perturbation + no cameras, (right) perturbation + fixed cameras. all planes -2 horizontal, 1 vertical; hpvp -upper horizontal + vertical plane; 1hp (lower) -lower horizontal plane; lpvp -lower horizontal + vertical plane; etc.; all -self-contact + all planes to 3.708 mm with the same uncalibrated end effector. The distribution of errors on the laser tracker dataset before and after calibration can be seen in Fig. 21.
The corrections for retroreflector parameters (compared to

Calibration of robot joint offsets -self-contained approaches
We compared the quality of the right arm joint angle offsets calibration, including the end effector length, by individual approaches. Self-contact approach (Sec (using 1-3 planes; Sec. 4.2) and pure self-observation calibration (Sec. 4.3) were compared to the results for nominal parameters and the reference parameters acquired by the laser tracker measurement device (Sec. 5.2). For the contact-based methods (self-contact, planar constraints), we compared the results with no camera information, fixed precalibrated cameras, and cameras being part of the calibration process. Parameters subject to calibration are listed in Tab. 4. Depending on the particular method, additional parameters may be subject to calibration (e.g., parameters of the plane) (see corresponding subsections in Sec. 4). RMSE after calibration of offsets and end effector length without perturbation can be seen in Fig. 22. The RMS error of distance in self-contact (in mm) drops from 3.80 mm before calibration to 1.40 mm after end effector calibration, and to 1.31 mm after all offsets calibration. Distance to plane (in mm) drops from 1.00 mm before calibration to 0.93 mmafter end effector calibration, and to 0.89 mm after all offsets calibration without cameras (0.91 mm with camera calibration and 0.92 mm with fixed cameras). Distance of reprojected markers in camera (in pixels) drops from 20.8 px before calibration to 18.0 px after end effector calibration to 15.3 px after all offsets calibration including camera calibration and to 15.6 px for fixed cameras calibration.
Corrections of the parameters to the nominal parameters are shown in Fig. 24. Corrections of the parameters are the smallest (also having the smallest standard deviation) compared to the values achieved by laser tracker calibration for the variant all with fixed precalibrated cameras, which means that we combined all of the chains (planar constraints, self-touch, selfobservation) to optimize the offset parameters. The resulting correction for parameter d EE1 is 5.46 ± 0.14 mm.

Comparison of nominal DH parameters and former calibration/calibration performed by laser tracker
To evaluate the resulting offset parameters for the robot acquired through calibration using individual approaches, we con-  Figure 23: Comparison of error on the testing dataset (D tracker )-dataset used for laser tracker reference calibration. Error computed towards laser tracker retroreflector data. x axis -individual datasets/methods used for finding robot kinematics parameters. We compare self-observation (Self-obs.) with one/both precalibrated cameras (on D whole ) calibrating robot offsets based on data from individual datasets, calibration using plane constraints (horizontal plane (lower/upper), 2 horizontal planes (tables), vertical plane (vp) and all planes (ap)), self-touch calibration (self-touch), and self-touch + planar constraints (all). Comparison when no cameras are used (Without cams), precalibrated (using self-observation on D whole ) but fixed cameras (Fixed cams.), and precalibrated cameras (on D whole ) being part of the calibration (With cams calib).
ducted a comparison study on the laser tracker testing dataset D tracker (which covers the whole robot right arm workspace and not only the area where calibration using self-contained approaches was performed). To be able to perform such an evaluation, the transform to laser retroreflector, acquired from laser tracker calibration, was used for all compared calibration results. These results are shown in Fig. 23

Calibration of robot joint offsets -sensitivity to perturbation
To evaluate the sensitivity of calibration approaches to perturbation, we run the experiment with initial perturbed parameters (p = 3) (see Section 4.8 for details). The results can be seen in Fig. 25. For fixed precalibrated cameras we achieve very similar results as without perturbation with comparable low standard deviation. For example, for the variant all with fixed precalibrated cameras, the resulting correction for parameter d EE1 is 5.46 ± 0.14 mm without perturbation and 5.52 ± 0.14 mmwith perturbation.  In Figure 26, we show a comparison of RMSE after calibration of all DH parameters (see Table 4) by the method all for different setups (fixed cameras, calibrated cameras and no cameras). Results on both training and testing datasets are shown and compared to the case where nominal parameters are used (before), EEF length and orientation calibration by the variant all (EEF cal.) and right arm offset calibration by the variant all (offset cal.). The self-contact distance, plane distance, and camera reprojection RMSE drops from 1.33 mm, 0.92 mm, and 15.6 px, respectively, after offset calibration, to 1.00 mm, 0.58 mm, and 14.8 px after calibration of all DH parameters with calibrated cameras, and further to 1.00 mm, 0.55 mm, and 15.2 px after calibration with precalibrated fixed cameras. When cameras are not used (noCams), self-contact distance further drops to 0.59 mm and planar distance to 0.64 mm. In this case, the end effector offset is not calibrated and the camera reprojection error would be higher.

Calibration of all DH parameters
In Figure 27, we show the resulting corrections of DH parameters (corrections are calculated as the difference from the nominal parameters) for the variant all when perturbed DH parameters were used as initial value (perturbation factor p = 3). We show initial values of perturbed DH parameters, the results of calibration by method all in the case without cameras (using only contact information from planar constraints and selftouch), with precalibrated cameras which are calibrated during the experiment, and with precalibrated but fixed cameras. In all compared cases, we reach a lower dispersion of the final values and these values are very close to the industrial nominal DH parameters. The best results with the lowest standard deviation and closest to the industrial nominal values are consistently reached by the method when precalibrated fixed cameras are used. When no cameras are used, we can see that the corrections of DH parameters for variant all are unrealistically far from the nominal parameters (e.g., for parameter a B1 we get corrections around 0.2 m). When also camera calibration is employed, the corresponding DH parameters corrections are reasonably close to nominal parameters and the calibration method is able to even for initially perturbed parameters (up to 10 cm difference from the nominal parameters) reach reasonable DH parameters values (all corrections are under 5 mm). We also show the results of the parameters from the laser tracker for comparison. End effector parameters cannot be calibrated by laser tracker thanks to its placement (see Fig. 3). Therefore, the value is not changing from the initial parameter. For the end effector length, we visualise as the nominal parameter the correction to the initial end effector length value achieved by the former end effector calibration (Sec. 5.1).
In Figure 28, we show the resulting RMSE of our calibration contact-based approach on the testing laser tracker dataset D tracker . The results for self-contact approach with fixed precalibrated cameras and with cameras calibrated as a part of the calibration approach are compared to self-observation approach where only camera information is used. All approaches are trained on the same D whole dataset. Comparison to the RMSE when nominal parameters and when DH parameters from laser tracker calibration are used, is provided. We show results for the whole laser tracker dataset and for the subset of the dataset which corresponds to the area where self-contact and selfobservation approaches were calibrated. On the whole dataset, RMSE is {3.06±0, 2.63±0, 224±77, 3.68±0.20, 3.58±0.21, and 12.36±0.55} mm for nominal parameters calibration (Nominal), laser tracker, self-contact calibration without cameras (noCams (all)), self-contact with cameras being part of calibration (Cam-Calib (all)), self-contact calibration with fixed precalibrated cameras (FixedCams (all)), and self-observation approach (S-O (all)), respectively. On the subset of the dataset, RMSE is {1.73 ± 0, 1.71 ± 0, 205 ± 71, 2.75 ± 0.17, 2.28 ± 0.17, and 11.88±0.38} mm for nominal parameters calibration (Nominal), laser tracker, self-contact calibration without cameras (noCams (all)), self-contact with cameras being part of calibration (Cam-Calib (all)), self-contact calibration with fixed precalibrated cameras (FixedCams (all)), and self-observation approach (S-O (all)), respectively. We can see that including self-contact information as a part of calibration provides significantly better results than using pure self-observation approach.

Observability analysis of individual approaches
We evaluated the observability indices O 1 and O 4 for the different combinations of calibration approaches and parameters subject to calibration. An overview is shown in Fig. 29 with O 1 on top and O 4 at the bottom. Please refer to Section 4.7 for details about how the indices are calculated. The bar groups on the x-axis correspond to the parameters subject to calibration and the number of parameters (columns of the identification Jacobian matrix J). Calibration using the laser tracker adds two additional parameters pertaining to the retroreflector placement (see Table 4). Color coding of individual bars marks the calibration approach with the no. data points in a given dataset without cameras / with cameras (rows of J). Due to the variable size of J, comparisons are to be made with caution. As expected, for the cases without cameras (noCams), there is a trend that with the number of calibrated parameters increasing (from ee-noCams, over offsets-noCams, to allDH-noCams), the observability indices are lower. When self-observation as a calibration method and corresponding camera extrinsic parameters are added (calibCams), observability indices O 1 in general significantly increase. Observability index O 4 increases after adding camera chains to calibration for the allDH calibration case, but drops for end effector length calibration-adding new parameters to be calibrated (columns of J) outweighs the effect of additional data from self-observation (rows of J). Comparing situations where the dataset and hence J has a similar size, we see that horizontal plane was more effective than vertical plane in our setup and that selftouch slightly outperformed combinations of all planar constraints (all planes). This is largely consistent with the RMSE errors after calibration (Fig. 23). Finally, for calibrating all DH parameters, the inclusion of cameras seems necessary.
Note that the observability analysis is also affected by measurement noise that may increase the effective rank of the Jacobian matrix. We assume that the measurement noise in this case is sufficiently small (experimental verification is presented at [3]).

Conclusion
Using a dual-arm industrial robot with force/torque sensing and cameras, we presented a thorough experimental comparison of geometric kinematic calibration using "self-contained approaches" using sensors on the robot-self-contact, planar constraints, and self-observation-with calibration using an external laser tracker. The main findings are summarized below.   Table 4 for details). Color coding of individual bars marks the calibration approach and dataset (no. data points without cameras / with cameras).
First, we studied estimation of the kinematic parameters of a new tool-a custom end effector (Section 5.1). To calibrate the tool length, self-contact alone proved effective; planar constraints or self-observation alone did not perform as well in isolation and improved only in synergy with one of the other approaches/datasets. Testing RMS errors of approximately 1 to 1.5 mm on the part of the workspace where calibration was performed were achieved. For orientation of the tool, the addition of cameras (self-observation) was needed. Combining different methods/kinematic chains proved effective, supported also by the observability analysis (Section 5.7 and Fig. 29).
Second, we analyzed the performance of "daily calibration": the joint angle offsets of a complete robot arm were added to the tool calibration (Section 5.3). While the end effector calibration is responsible for most of the error in the workspace, further improvement from the offset calibration is achieved with all methods. To assess whether the calibration is effective only locally-on the particular dataset-or whether better kinematic parameters of the robot were learned, a comparison on an independent dataset covering the whole workspace (laser tracker calibration) was performed (Section 5.4 and Fig. 23). Corrections to robot parameter values were also analyzed. Importantly, all self-contained approaches except for the planar constraints in isolation were able to achieve good results, sometimes outperforming the robot nominal parameters. Selfcontact alone proved the most effective; the best results were achieved using a combination of datasets and methods. Selfcontact was further improved by the addition of planar constraints. Self-observation alone was sensitive to the size of the dataset but was effective in combination with contact-based methods; the best results were achieved using sequential calibration: camera extrinsics were calibrated first, followed by robot arm kinematics and tool calibration. Additionally, similar results were achieved when perturbing the initial parameter estimates (Section 5.5).
Third, we performed a complete geometric kinematic calibration of one robot arm plus the tool (all DH parameters) using different methods (Section 5.6). Good results are achieved on the individual datasets (Fig. 26), but further analysis reveals that the calibration suffered from overfitting to the particular dataset / part of the robot workspace. On an independent dataset, the performance of nominal parameters was not matched (Fig. 28). We also found that when all DH parameters are subject to calibration, the need for the synergy of different approaches increases, as testified by the unrealistic parameter corrections with the contact-only (noCams) approaches in Fig. 27 and the observability analysis (Fig. 29, allDHno-Cams) or the self-observation only approach in Fig. 28. Like previously, the sequential calibration-first cameras, then robot kinematics-gives the best results. Although the performance of nominal or laser tracker calibration parameters on the whole workspace could not be matched (Fig. 28), the performance of the combination of self-contained approaches-also in the case of initial parameter perturbation-is reasonable (less than 4 mm error) and for a less accurate platform like a service robot may suffice.
Fourth, the comprehensive dataset collected is made publicly available [3] and can be used for additional analyses. This constitutes an additional contribution of this work.

Discussion and future work
Geometric kinematic calibration of industrial robots is usually performed using external metrology-measurement arms (e.g., [4]) or contactless measurement systems like laser trackers [5][6][7][8]. Newman et al. [7] calibrated a Motoman P-8 robot using an SMX laser tracker, improving the RMS error from 3.595 mm to 2.524 mm. Specifically related to the setup used in this work, our platform was previously calibrated using two different methods: (1) Redundant parallel Calibration and measuring Machine (RedCaM) by Beneš et al. [11], Volech et al. [12], and (2) Leica laser tracker. Petrík and Smutný [13] reviewed the precision of these methods using a linear probe sensor. Based on a dataset of 43 different poses with touching end effectors, they calculated the mean error as 0.67 (range 2.92) mm on CAD model, 0.54 (range 2.55) mm on Leica based calibration and 2.45 (range 9.92) mm on RedCaM based calibration. Other approaches-see [8] and additional references cited thereinusually achieve sub-millimeter accuracy. It was not our goal to directly compete with these works; instead, our aim was to assess the potential of automatic self-contained kinematic calibration: using sensors on the robot and avoiding the need for external metrology. We could demonstrate that such selfcontained approaches-even if the initial robot parameters are perturbed-can yield less than 4 mm position errors over the robot workspace. The accuracy increases when a combination or synergy of these approaches (e.g., self-contact and self- observation) is exploited. We chose our platform, an industrial robot, out of convenience and to have a stable enough plant that can provide stationary results. However, we see the main application area in collaborative and service robotics. These platforms are typically more lightweight, flexible, and less precise and they may be often redeployed, their kinematic configuration changed etc. At the same time, they often come with a rich set of inexpensive but increasingly powerful sensors. Cameras and means to detect physical contact are becoming common. All these factors pave the way for automatic multisensorial selfcontained calibration as demonstrated here.
We provide not only a thorough experimental investigation, but also a conceptual contribution in that we develop methods how to combine several of the methods into a single cost function. Further, unlike Birbach et al. [2] who claim that simultaneous calibration using all the available sensors is advantageous, we consistently found sequential calibration to perform best: camera calibration followed by robot kinematics. Conversely, calibrating robot kinematics together with camera extrinsics simultaneously was not as successful. Furthermore, if camera extrinsics were not precalibrated and inaccurate nominal parameters were used, the optimization often converged to physically impossible local minima. One known limitation of closed-loop calibration approaches-those relying on physical constraints on the end effector position or orientation-is that the set of poses is limited, which may affect the identifiability of parameters [1]. This holds pretty much for all self-contained calibration-self-contact or self-observation-as these also naturally constrain the set of robot configurations available. This has also been the case in this study. Better coverage of individual joint ranges within the approaches used here-in particular for the planar constraints-would further improve the results and yield better generalization to other parts of the workspace. Combining different methods / kinematic chains significantly mitigates this problem.
It should also be noted that although we used specially designed end effectors that were developed to combine selfcontact (acting as a sphere) and self-observation (with flat tiles to host fiducial markers), the methods developed here have wider applicability. Contact can occur at any part of the robot provided that the link can be represented in the kinematic model; fiducial markers can be placed on any robot part or avoided altogether by tracking parts of the robot directly (e.g., [2,17,18]).
There are several directions for future work. First, when combining several calibration approaches into a single cost function (Sec. 4.4), the errors obtained from the different components could be scaled by coefficients that are inversely proportional to their uncertainty. We attempted to acquire such coefficients using repeatability measurements, but failed to obtain estimates that would reflect the true uncertainty associated with different approaches and possibly individual data points. Additional measurements of the uncertainty of individual components (beyond Sec. 4.10) and their propagation would be required. Thus, all components were weighted equally in this work, but this can be changed in the future.
Second, the methods presented here can be extended to ex-ploit the existing sensors differently or to incorporate additional sensory modalities. For example, the two cameras in our setup were not used explicitly as a stereo head. Instead of reprojecting the end effector into the 2 image frames, one could also project the observed position of the end effector in image coordinates of both eyes (pixel (u, v)) to 3D space (X eye ) (similar to [17,56]). The self-contact approach would yield more than a 1dimensional constraint in case the contact position (and hence 3 components of the pose) could be measured such as when using an artificial electronic skin [31,32,35]. Inertial sensors-in the robot head ( [2]) or distributed on the robot body [57,58]could be also added. Finally, one could also calibrate both manipulators simultaneously. Third, for online recalibration to be performed repeatedly, the number of poses / data points needed can be reduced by employing intelligent pose selection (e.g., [59][60][61][62]).
Fourth, the standard calibration method using non-linear least squares optimization (Levenberg-Marquardt algorithm) can be compared with filtering approaches [18,26] or with methods that pose fewer assumptions on the initial model available (e.g., [63]).