Heterogeneous Map Fusion from Occupancy Grid Histograms for Mobile Robots

– With the increase in the capabilities of robotic devices, there is a growing need for accurate and relevant environment maps. Current robotic devices can map their surrounding environment using a multitude of sensors as mapping sources. The challenge lies in combining these heterogeneous maps into a single, informative map to enhance the robustness of subsequent robot control algorithms. In this paper, we propose to perform map fusion as a post-processing step based on the alignment of the window of interest (WOI) from occupancy grid histograms. Initially, histograms are obtained from map pixels to determine the relevant WOI. Subsequently, they are transformed to align with a selected base image using the Manhattan distance of histogram values and the rotation angle from WOI line regression. We demonstrate that this method enables the combination of maps from multiple sources without the need for sensor calibration.


I. INTRODUCTION
For robots to operate freely in the environment, they need to gather information about their surroundings.Through sensor readings, it is not only possible to understand the position of the robot relative to other objects but also to record this information and map the environment.This recorded map is crucial for generating motion plans.Therefore, it should be as precise and up-to-date as possible to optimise the robot motion and avoid safety-critical situations [1].
A variety of methods exist that record and map the environment by employing different kinds of sensors.Robot motion estimation can use dead reckoning from the robot's wheels to gain the travelled distance, or IMUs can be used to estimate the scale of the motion [2].In static locations, key point markers can help triangulate the exact location, or a satellite signal can be used in GPS-enabled environments [3].This information is then combined with various distance detection sensors, such as depth cameras, radar, LiDAR, sonar, or similar, and used for simultaneous mapping and localisation (SLAM) of the environment and the positioning of the robot in it [4].This has proven to work well in various applications and has allowed the use of SLAM systems not only to gain an initial estimation of the robot's surroundings but also to update it throughout its lifecycle [5].
However, using an unreliable source of distance information for mapping tasks can result in inaccurate maps, leading to subpar planning of robot motion.For instance, sonar and radar data recordings have to deal with noisy distance estimations.Stereo cameras are susceptible to lighting conditions and the number of features in the scene.Laser-enabled depth cameras have difficulty registering glass structures and have a limited field of view.2D laser sensors can only record a single plane, and 3D sensors still suffer from not being able to detect lightabsorbent and transparent obstacles.Therefore, redundancy in mapping tasks plays a crucial role in enabling the robots to execute their motion in a safe and efficient manner.Sensor fusion can be employed to obtain such redundancy at runtime but requires prior knowledge about the sensor types and calibration to ensure proper functionality [6].This can be cumbersome for systems that may include multiple sensors of different types.Each sensor combination requires different calibration before the start of the program.Alternatively, redundancy can be obtained by merging produced maps from different data sources.Elements that are not recorded from one data source can appear in maps of other types without the need for calibration and a priori knowledge of sensor interactions.Using post-processing of already obtained maps allows for the use of a dynamic number of sensor sources with a simple setup process for SLAM systems.Therefore, in this paper, we propose a method of map merging from heterogeneous sources for mobile robot navigation.

II. RELATED WORK
The task of automating the creation and merging of maps of the environment obtained from different heterogeneous sources is relevant in various fields of robotics.A system, described in [7], enables a mobile robot to acquire knowledge about its surroundings and accomplish tasks by understanding the environment.The mobile robot gathers information from camera sensors about the environment.Through a machine learning module, the robot can identify objects within the surroundings and determine its semantic positioning in the Applied Computer Systems _________________________________________________________________________________________________2024/29 environment.However, map retention and reusability are questionable in such situations due to possible changes in the environment.In [8], a map of an environment is obtained using mapping data generated by an independent cleaning robot during its initial cleaning operation.The method describes the transferability of mapping and behaviour information to an independent cleaning robot to execute a behaviour linked to the label during a subsequent cleaning task; however, it does not address subsequent map updates from new sources.The MRSLAM (multi-robot SLAM) technique, described in [9], primarily includes a resilient map merging algorithm and a decision-making algorithm that governs field agents.Here, a single map is obtained in real time from multiple robots in the same environment but requires similarity in mapping sensor sources.
In [10], a method is introduced for constructing a 3D point cloud map using laser-measured data as the robot navigates its surroundings.To balance the trade-off between data size and terrain accuracy, a terrain-adaptive density mapping technique is employed, utilising local curvatures as criteria for simplification.This adaptive density mapping technique is then integrated into the map merging framework, enhancing both matching speed and accuracy.
Paper [11] introduces a novel algorithm for map fusion in a multi-robot cooperative SLAM framework.This algorithm is designed to create a map of an unfamiliar environment by utilising multiple agile exploration robots.Two major challenges include the accumulation of redundant information in the evolving map over time and the occurrence of mapping errors.In [12], a comprehensive overview is provided of innovative strategies aimed at addressing these challenges by eliminating redundant information from the map and correcting mapping errors.
In [13], real-time sensor fusion for SLAM is proposed through trajectory matching.Although the algorithm is lightweight, it does not consider the integration of maps obtained from different robots at different times.Direct sensor fusion from LiDAR and monocular camera sources is introduced in [14].3D LiDAR and sensor fusion is depicted in [15], [16].Other methods involve sensor fusion between LiDAR, RADAR, ultra-wideband, monocular RGB, RGB-D, and other sensors [17], [18], [6].While these methods perform well during the runtime and real-time collection of mapping data, they do not address the challenge of matching mapping data taken at different times and possibly from different robot devices.Such map fusion needs to account for synchronisation in both time and space across various sensors [19], [20].
In [21], post-processing map fusion is presented, involving multiple mapping sources, one of which is ground truth, and a prior map of the environment.An entirely robot-detected map merging approach is proposed in [22], where LiDAR fusion with monocular RGB SLAM is achieved through keypoint matching.In this method, the same keypoints need to be extracted in all map sources to align the mapping data.In [23], a method is proposed for merging top-down views of the environment with 3D LiDAR data, which is rarely available for indoor robot-enabled locations.In [24], a method of merging heterogeneous maps in a post-processing step is proposed by executing a set of motions by different robots in the same map, information that is not always available or easy to reconstruct.Direct pixel matching and subsequent clustering are proposed in [25].
In this article, we propose an approach for merging 2D maps of the environment based on the creation and use of statistical histograms, which, in turn, rely on occupancy grid pixels.The outlined approach is iterative, involving transformations on this data.As such, the method is generally sensor, recording time, and robot trajectory agnostic, allowing it to be updated whenever a new map of the environment becomes available.

III. PROPOSED APPROACH
The proposed approach utilises occupancy grid maps from multiple sources to generate a single, final map of the environment.Mapping is conducted for the entire area with each data source, and the recorded occupancy grids are employed in post-processing to derive the final map.In general, the map is indifferent to the sensors used in its collection and the robots with which it was recorded, as long as the final result is an occupancy grid of the same scale.The objective of the proposed approach is to merge the provided maps in a manner that maximises the alignment of map elements.
We assume that the recorded maps represent the same environment and that their mapping is complete, i.e., they have captured the entirety of the area under consideration.We designate any map from an arbitrary source as the base map, denoted as Mb.For optimal performance, this map should exclude elements beyond the specified area, such as those recorded through windows or artefacts resulting from mirrors and glass doors.An example of such a map recorded from visual SLAM is shown in Fig. 1.Elements of the base map Mb will remain fixed in the proposed approach as we will use it to align maps from other sources with it.The proposed approach consists of the following steps: Step 1.To establish the bounds of the relevant map data, we utilise pixel histogram information.Based on the data from Mb, histograms of pixel distribution are created along Ox and Oy coordinate axes.As map data can be noisy, and single-pixel values might not accurately represent the actual position of the recorded pixel, we enhance the granularity of the histogram to encompass p consecutive pixels within a single bin.The bin intervals of pixel count values along the Ox and Oy coordinates are collected as follows: ( ) where binx(i) -i-th interval along Ox, biny(j) -j-th interval along Oy. i is the number of bins along x axis and j is the number of bins along y axis.xv and yv represent the pixel value at the coordinate, where the value is 1 if the pixel is occupied and 0 if not.
Based on the histograms in binx and biny, a window of interest (WOI) with dimensions W×H is identified, within which the map is situated.Each bin value in binx and biny is iteratively examined, and the first and last values that exceed the thresholds δhist(x) and δhist(y), respectively, are chosen as the starting and ending values for the WOI for each axis.An example of histograms and the identified WOI, indicated by a blue frame, is shown in Fig. 3. where imin and jmin are the indices of the first valid bin that exceed δhist(x) and δhist(y), respectively.imax and jmax are the indices of the last valid bin that exceed δhist(x) and δhist(y), respectively.
Step 2. For any map from an arbitrary source Ms, the same histogram calculation is performed as in Step 1.The histograms obtained from Ms using (1) and ( 2) are compared to the histograms from the base map Mb by calculating the Manhattan distances: where dx is the Manhattan distance between histograms along Ox, dy is the Manhattan distance between histograms along Oy.The position at which the sum of distances dx + dy is minimised is considered the optimal position for the WOI in the source map.This allows for the translation of the source map to align it with the base map.
Step 3. To estimate the rotation angle between WOI of Mb and Ms, we compute linear regression functions for the data for each map, respectively.For the WOI of Mb, the coefficients of the linear function are expressed as: and are calculated as follows: For the WOI of Ms, the coefficients of the linear function are expressed as: and are calculated analogously as for Mb with (8).
The angle between the linear functions ( 7) and ( 9) represents the rotation angle of Mb relative to Ms and is computed as follows: ( ) ( ) where θ is the rotation angle.Figure 4 illustrates the operation of the map rotation procedure using the WOI of each respective map.
Step 4. After performing the translation and rotation transformations, the source map Ms is updated and becomes Ms′.We evaluate if the translation and rotation are sufficient to fulfil the map fusion criteria.For this purpose, the equation results from ( 5), (6), and ( 10) must fall under a given threshold for each respective value: where εx and εy are threshold values for the translation coefficients and εθ is the threshold value for the rotation angle.If the condition is not met, Steps 1 to 3 are repeated for the new values of Ms′.Execution of the method is completed once the criteria in (11) are met or when the iteration count becomes larger than a pre-set value: where iterN is the N-th iteration of the method, and itermax is the maximum number of iterations.Full method implementation for the map Ms with precomputed values for Mb is described in Algorithm 1.

IV. EXPERIMENTS
To validate the proposed method, we conducted experiments by fusing maps recorded by a mobile robot SLAM system from multiple sources.The base map utilises a visual SLAM recorded map, as shown in Fig. 1.The map fused with the base map is obtained from a 2D LiDAR SLAM and is visualised in Fig. 2.During the experiment, two iterations were carried out to move and rotate data from Ms relative to Mb.The result of the map merging can be seen in Fig. 5.As observed, the data from Ms coincide with the data from Mb where possible.

A. Validation on SLAM Maps
To further evaluate the validity of the approach, a mapping of real-world environments was carried out in two distinct industrial environments.We refer to these as exp1 and exp2, respectively.For both of these experiments, the base map was recorded using a visual SLAM and the source map with LiDAR SLAM.
For exp1 the Mb is visualised in Fig. 6.The Ms is depicted in Fig. 7.The final merged map, after executing the algorithm, is shown in Fig. 8. Here, the overlapping pixels are depicted in blue colour.The pixels from the Mb are shown with red, and pixels from Ms are shown in green pixels.The final map is a combination of both map information that adds additional information from both SLAM sources.We also record the number of pixels generated in the final map.Only the pixel values that represent occupied grids are used for this purpose, as they are the basis for map fusion.Additionally, we show the added data in pixel count that appears from visual and LiDAR SLAM separately.The data is given in Table I.We can see that in exp1 we gain an overlap of map data of 92.2 %.The visual SLAM data gives an additional 2.3 % amount of data that would not be recorded using just LiDAR SLAM.LiDAR SLAM, however, accounts for 5.5 % of additional data that would not be obtained by using only visual SLAM.
The Mb recorded with a visual SLAM for exp2 is visualised in Fig. 9.The Ms recorded with a LiDAR SLAM is visualised in Fig. 10.The final, combined map is visualised in Fig. 11.Blue pixels represent the overlapping pixels, red represents the pixels from Mb, and green from Ms. For exp2, the pixel coverage data is given in Table II.Here, even more data can be obtained in the final map by fusing multiple sources.9 % of unique pixel values account for coming entirely from the visual SLAM source and 21 % uniquely from the LiDAR SLAM source.

V. CONCLUSION
The experimental results demonstrate the effectiveness of the proposed approach on real examples of maps obtained from robot sensors.It is possible to iteratively update the map translation and rotation to fuse maps generated from different sources.The proposed method is agnostic to the map generation method as long as recorded information is preserved in the form of an occupancy grid.Additionally, it is possible to extend the method to use multiple maps by independently fusing them with the base map information.This means that the proposed method does not require intrinsic or extrinsic calibration between sensors and allows the fusion of maps obtained from different robots as well as at different times.Using histograms as the basis for map region estimation provides a lightweight solution for estimating map bounds and fusing such information.
However, the proposed approach relies on several assumptions, one of which is that the recording pixel size is equal across multiple SLAM tasks.This allows for the use of the same scale assumption in the current implementation of the proposed method.Another assumption is that the mapped environment is complete, and the fused environment is fully mapped to the best possible extent.However, it might be possible that only a part of the map is available for some sources, making it difficult to obtain a proper WOI in such cases.
Here, more intelligent methods for estimating relevant WOI patches would need to be implemented, such as estimation by pixel overlap or employing Hough Transforms to estimate key line overlaps.Another issue to consider is merging maps from lifelong mapping systems in dynamic or changing environments.In this case, the map age could be taken into account, and occupancy grid pixels corresponding to newer maps could be weighted.This would enable the use of older maps in the environment to estimate missing elements in subsequent mapping steps and update the possible base map information through lifelong mapping, even if different sensor sources are used.These ideas will be explored in our future research.

ACKNOWLEDGEMENT
This research has been funded by the European Regional Development Fund within the project "Clean 4.0 -Accelerating the Adoption of Autonomous Technologies across the European Cleaning Ecosystem", grant number 1.1.1.1/20/A/186.

Fig. 1 .
Fig. 1.Visualisation of the base map Mb from visual SLAM source.Other maps allow for the presence of artefacts related to scanning open spaces behind windows and doors.Any other map from an arbitrary source is designated as Ms, where s ∈ S as long as s ≠ b.Here, S is all the available map sources.An example of such a map recorded from a 2D LiDAR source is shown in Fig. 2.

Fig. 2 .
Fig. 2. The visualisation of map Ms from a 2D LiDAR source.

Fig. 3 .
Fig. 3.The visualisation of Mb with histograms and window of interest.Window of interest is visualised by the blue rectangle.Red histogram shows bins with pixel values along x axis.Green histogram shows bins with pixel values along y axis.The final WOI is recorded as a rectangle formed of minimal and maximal x and y coordinates as follows: [ ] min min min • , • WOI p i p j = the i-th and j-th histogram bins in base map, the i-th and j-th histogram bins in source map.

Fig. 4 .
Fig. 4. The visualisation of the rotation procedure of the window of interest of Ms relative to the window of interest of Mb.

Fig. 5 .
Fig. 5.The visualisation of map fusion.The Mb is shown in blue colour pixels overlaid over the Ms map.

Fig. 8 .
Fig. 8. Final result of fused map for exp1.Blue pixels represent an overlap of visual and LiDAR SLAM maps.Red pixels represent the visual SLAM pixels, and green -LiDAR SLAM pixels.

Fig. 11 .
Fig. 11.Final result of fused map for exp2.Blue pixels represent an overlap of visual and LiDAR SLAM maps.Red pixels represent the visual SLAM pixels, and green -LiDAR SLAM pixels.