HYBRID ONLINE MOBILE LASER SCANNER CALIBRATION THROUGH IMAGE ALIGNMENT BY MUTUAL INFORMATION

This paper proposes an hybrid online calibration method for a laser scanner mounted on a mobile platform also equipped with an imaging system. The method relies on finding the calibration parameters that best align the acquired points cloud to the images. The quality of this intermodal alignment is measured by Mutual information between image luminance and points reflectance. The main advantage and motivation is ensuring pixel accurate alignment of images and point clouds acquired simultaneously, but it is also much more flexible than traditional laser calibration methods.


INTRODUCTION 1.1 CONTEXT
Mobile mapping systems are becoming more and more widespread because of their ability to capture extremely accurate data at large scale.Mobile mapping faces numerous challenges: • Volume of data acquired calling for very efficient processing to scale up • Complexity of the captured 3D scenes (occlusions, radiometry, mobile objects,...) • Localisation accuracy: GPS outage are frequent in city cores and may lead to large geolocalization errors.
Whereas mobile mapping systems have been traditionally divided between laser and image based platforms, hybrid systems (with both image and laser sensors) are getting more and more popular.For such systems, the laser and image sensors are calibrated independently using standard calibration procedures.In our experience, such independent calibration methods may lead to registration errors of a few pixels between the image and laser data (cf figure 2).We believe that reducing this error below a pixel may unlock new applications for hybrid systems such as: • Joint image/laser scene interpretation • Joint image/laser 3D surface reconstruction and texturing (Vallet et al., 2015) also advocates that precise image/laser alignment is required for mobile objects delineation in images.Moreover, the method that we propose is online, meaning that it does not require a special acquisition in a specially equipped site to perform the calibration, and that it can be used to assess the stability of image and laser calibrations in time.

RELATED WORKS
A calibration method is characterized by: • The features: what is matched between the two sources ?It can be known targets, feature points (Chen and Tian, 2009), regions (Liang et al., 1997), edges or primitives, in which case the method is called sparse, or the points/pixels composing the data, in which case the method is called dense.
• Similarity measure: how good are the matches ?
• Deformation model: how is a source deformed to align to the other ?(rigid, non-rigid, affine...) • Optimisation: which deformation ensures the best similarity ?Note that whereas optimization often aims at minimizing an energy, here we maximize similarity.
Sparse methods use high level features which are more discriminative and lead to a simpler optimization.Plenty of detectors ensure invariance to orientation, scaling, perspective, lighting,... However it might be difficult to find features invariant across modalities and accurate enough, and different features might be detected in the sources leading to error prone feature matching.Conversely, dense methods do not rely on feature detection but only on the similarity measure, which is more simple and yields alignment information over the whole data.However the optimization is more intensive and the similarity measure can be less discriminative than features.
For sparse method, the similarity measure is dependent on the features: based on SIFT descriptors, region shapes, geometric distance between primitives...For dense methods, the most simple measure is the (point to point or point to triangle) squared distance for point clouds, and Zero mean Normalized Cross Correlation (ZNCC) for images.For multimodal registration, the Mutual Information (MI) (Mastin et al., 2009)  Finally, the optimization problem has the form: • I1 et I2 are the data sources • t : transformation to apply to I2 to align it with I1 • T : set of possible transformations • f : similarity measure For a rigid transform, T is parametrized by the 3 components of a translation (the lever-arm) and the 3 bore-sight angles.For a mobile laser scan, this transformation is expressed in a mobile frame so the resulting transform is not a rigid transform, even if it has the same parameters.To optimize these parameters, three classes of methods may be used: Closed form For least squares similarity measure and simple transform space, the optimal transform can be expressed directly in closed form.
Gradient-based If the gradient is easy to compute, it can guide the optimization by following the steepest ascent line.Such method converge rapidly to a precise optimum but for non convex energies, this optimum can be local without guarantee that the global optimum is reached.The various methods (gradient descent, Newton's method, Levenberg-Marquardt, conjugate gradient) mainly differ on the exact choice of the direction and of the step length.
If uncertainties on the inputs are known, then the Gauss-Helmert method can be used (McGlone et al., 2004).
Gradient-free If the gradient is hard to compute or if the energy has many local optimas, a gradient free method can be preferred.Gradient free methods will evaluate the similarity on samples of T with various heuristics to choose the next samples.Monte Carlo methods sample the space with a random distribution that becomes concentrated near local optimas as a temperature parameter decreases.The bats algorithm evolves a population of samples inspired by the behaviour of herds.Genetic algorithms randomly combine the best previous solutions into new ones.The Nelder-Mead (or simplex) algorithm (Deng et al., 2008) evolves a simplex toward the optimum according to the similarity measure on its vertices.The conjugate Powell algorithm (Maes et al., 1997) iterates optimization over individual parameters.
In the case of aerial laser scanner calibration, (Skaloud and Lichti, 2006) choose manually selected planar surfaces as features (especially gable roofs) then automizes this procedure in (Skaloud and Schaer, 2007).The similarity measure is simply the plane to plane distance.The four plane paramteres are added to the set of unknowns and the optimization is a weighted least squares solved by a Gauss-Helmert with constraints.This method is adapted in Riegl's RiPROCESS software (Rieger et al., 2010) where additional constraints on the acquisition are proposed (scans with opposite direction and different angles).(Le Scouarnec et al., 2013) proposes a static calibration where lines from planar surfaces are put in correspondence.Once again plane parameters are unknown and the Gauss-Helmert method is used for the optimization.Finally, (Nouira et al., 2015) 1.3 DATA PRESENTATION The data used in this study was acquired by the mobile mapping system (MMS) described in (own citation).It is equipped with 14 full HD (1920×1080px) cameras acquiring 12 bits RGB images (8 images panoramic + 2 stereo pairs) as shown in figure 3. The laser sensor is a RIEGL LMS-Q120i that was mounted transversally in order to scan a plane orthogonal to the trajectory.It rotates at 100 Hz and emits 3000 pulses per rotation, which corresponds to an angular resolution around 0.12 • .Each pulse can result in 0 to 8 echoes producing an average of 250 thousand points per second.In addition to the (x, y, z) coordinates (in sensor space), the sensor records multiple information for each pulse (direction, time of emission) and echo (amplitude, range).
The amplitude being dependant on the range, it is corrected into a relative reflectance.This is the ratio of the received power to the power that would be received from a white diffuse target at the same distance expressed in dB.The reflectance represents a range independent property of the target.
The MMS is also mounted with an Applanix POS-LV220 georeferencing system combining D-GPS, an inertial unit and an odometer.The laser sensor is initially calibrated by topometry in order to recover the transformation between the sensor and Applanix coordinate systems which allows to transform the (x, y, z) coordinates from sensor to a geographic coordinate system.The result, with a colormap on reflectance, is displayed in Figure 4.
The point cloud is acquired continuously in time, such that when projecting a point cloud to an image, some projected points may have been acquired a few moments before or after the moment the image was acquired.

FRAMES
In this work, we call T αβ the rigid transforms defined by a rotation R αβ and a translation T αβ that transform point coordinates x α in frame α to coordinates x α in frame β: These transforms link the 4 following frames: • The W orld frame W attached to the scene • The V ehicle frame V attached to the vehicle and given by its geopositionning system.It depends on time as the vehicle moves, and is defined by TV W (t) • The Laser frame L attached to the laser that produces point coordinates x L i in this frame (i is the point index).The object of this work is estimating T a LV , through its parameters a = (tx, ty, tz, θx, θy, θz) where T a LV is the translation of vector (tx, ty, tz) and • The Camera frame defined by TCV that we assume known (already well calibrated).Assuming that the image has been resampled to compensate for lens distortion, the image pixel coordinates are defined from Camera coordinates by a simple pinhole model: Using these notations, and knowing the exact times ti at which each 3D point x L i is acquired, and tI at which image I was acquired, we can define the projection of a laser point x L in I: In this framework, re-estimating TLV does not lead to a rigid transform of the pointcloud as TV W (ti) depends on the acquisition time of each point which is illustrated on Figure 5.The estimation of TLV will be performed by maximizing over the transformation parameters r a similarity measure between the reflectance measured on 3D points and the corresponding luminance on the pixels on which they project.

MUTUAL INFORMATION
Mutual Information (MI) is a popular similarity measure for multimodal registration.In the image/laser case, there is not a oneto-one correspondence between the pixels and the 3D points, so we need to choose over which set the MI will be computed.The most straightforward choice is to compute MI over the 3D points projecting in the image (with the initial calibration), keeping the .nthese points.For the MI to be comparable between iterations, this set will not be reevaluated.The features correlated will be the laser reflectance Ri of point i and the image luminance L(p a (x L i )) = L a i estimated from the RGB components for the image (cf figure 1).
Computing MI relies on computing a joint (2D) histogram of these features over such correspondences: where n L bin and n R bin are the respective number of bins of the Luminance and Reflectance histograms.For simplicity, we assume that L a i and Ri are normalized to lie in [0, n L bin + 1] and [0, n R bin + 1].With this convention: allows to associate a feature value to the two closest bins.

LEVENBERG MARQUARDT OPTIMIZATION
We have explained how to compute the MI for a given transform a.The online calibration proposed in this paper relies on finding the optimal transform a * that maximizes the MI.As we can assume a good initialization a0 (calibration by topometry for instance), we chose a gradient descent algorithm: Levenberg-Marquardt (LM).LM optimizes the MI iteratively by: where G is the gradient and H the Hessian of the MI and µ the socalled damping factor.The gradient computation goes following (Dame and Marchand, 2011) and (Dowson and Bowden, 2006): where: and ∇pL a i is simply the image luminance gradient (estimated at p a (x L i ) by bilinear interpolation).Calling ∇α the derivative along coordinates in frame α and applying it to (2), we get: so from (5) we get: where: and following (Chaumette and Hutchinson, 2006): Finally the Hessian is simplified as: ) As advised in (Dowson and Bowden, 2006), we will neglect the second derivative ∇ 2 a p lr as it is approximately zero near the minimum, so its use only improves the speed of optimisation slightly at great computational expense.Consequently, the final Hessian computation only requires p l , p lr and its partial derivatives.After each iteration, we get the new transformation parameters.The µ control parameter defines how far we go along the gradient direction.The larger µ, the smaller are the corrections.µ is initialized at 2 10 = 1024 and adjusted at each iteration: if the MI increases, µ → µ/2 to speed up convergence, else the step is not validated and µ → 2µ.

EXPERIMENTS
Results the image/laser calibration method described above on a first example (Place de la Bastille in Paris) are presented in Figure 6.The quality of the result is evaluated qualitatively on some close-ups shown on Figure 7.
The MI similarity measure decreases from 25.37% for the initial calibration to 29.12%.Figure 8 gives the evolution of the MI through LM iterations.
A second example (Saint Sulpice church in Paris) is presented in figure 9) with close ups in Figure 10.On that example, we voluntarily perturbated the initial calibration to assess the robustness of the method, which explains the larger error.
Because of this higher initial error, the MI increases more significantly from 14.57% to 21.02%.The evolution of MI through iterations is displayed in Figure 11.

INFLUENCE OF THE NUMBER OF BINS
The image grey level is in [0, 255] and laser reflectance in [-20dB, 0dB].These intervals are split in bins to create the histogram.Larger bins reduce computation time and ensure more robustness to noise.The best results were obtained for 32 bins for image gray level and 16 bins for laser reflectance.However, a low number of bins might lead to low precision, so after a first convergence, a more iterations can be performed with an increased number of bins.  Figure 12: Calibration result on a road mark 3.2.2ACCURACY Our initial objective of pixel accuracy is only possible for points acquired close enough (in time) to the image as the drifts in time.For points further away (in time) we observed an error up to 3 pixels after calibration (cf Figure 12).The intrinsic calibration of the image might also lead to errors of around one pixel.

ROBUSTNESS
The MI similarity measure is quite robust to multimodality and illumination changes.This is illustrated in complex urban environments: narrow streets (Figure 9), large squares (Figure 6) or crossings (Figure 13).The calibration always reaches the correct optimum even in the Sain Sulpice example with perturbated input.Robustness to moving objects is also demonstrated on Figure 13: a bus has moved between the instant it was acquired by the image and the laser.Despite its large size, the resulting calibration is still accurate.
Figure 13: Correct calibration result despite a large mobile object (bus) 3.2.4STABILITY To measure the stability of our hybrid calibration, we applied it independently to 3 image/laser couples from the same acquisition but separated by a few minutes each (Figure 14).Convergence was achieved in the 3 cases and the resulting parameters vary of a few centimetres for the translation and below 0.001 • for rotation.The relatively low stability for translation is in contradiction with the pixel accuracy that we observe in the results.The reason is probably that the translation part of the calibration compensates for the georeferencing drift that varies in time.

CONCLUSION
Mutual Information (MI) is a powerful, precise and dense measure that is well adapted to aligning image and laser data.It is robust to illumination variations, multimodality, occlusions and mobile objects because its exploits all the information contained in the image and mobile laser scan.
Its computation time (3 to 7 minutes depending mostly on the number of 3D points projected in the image) is completely acceptable for the typical use scenarii: • Laser calibration on an area with good GPS visibility in order to limit the impact of georeferencing errors.
• Calibration on a few extracts of each acquisition performed by a mobile platform in order to assess the stability of the calibration through time.
In the future, we aim at improving the method in several manners: • Handling multiple images instead of a single one • Authorizing deformations of the laser scan to compensate for the georeferencing drift, or using image/laser attachment in a more global registration procedure.
• Selecting the 3D points to project with a fine point visibility computation.

Figure 1 :
Figure 1: Image (top) and laser data (bottom, coloured with reflectance) on Place de la Bastille, Paris

Figure 3 :
Figure 3: A set of images acquired for one pose (panoramic + 2 stereo pairs)

Figure 4 :
Figure 4: A view of the Mobile Laser Scan (MLS) used in this study, coloured with reflectance.

Figure 5 :
Figure 5: Non rigid transform of a mobile laser scan induced by a rigid transform on the laser frame of 45 • around the vertical yaw axis.Top: Mobile laser scan with correct calibration, Bottom: Mobile laser scan with calibration rotated.Scanlines are not orthogonal to the trajectory any more

Figure 6 :
Figure 6: Place de la Bastille -Before (top) and after (bottom) hybrid calibration

Figure 7 :
Figure 7: Place de la Bastille -Close up on the red (top), blue (middle) and green (bottom) parts before (left) and after (right) hybrid calibration

Figure 9 :
Figure 9: Saint Sulpice area before (top) and after (bottom) hybrid calibration

Figure 14 :
Figure 14: Saint Sulpice results on three extracts before (left) and after (right) hybrid calibration