The MADMAX data set for visual‐inertial rover navigation on Mars

Planetary rovers increasingly rely on vision‐based components for autonomous navigation and mapping. Developing and testing these components requires representative optical conditions, which can be achieved by either field testing at planetary analog sites on Earth or using prerecorded data sets from such locations. However, the availability of representative data is scarce and field testing in planetary analog sites requires a substantial financial investment and logistical overhead, and it entails the risk of damaging complex robotic systems. To address these issues, we use our compact human‐portable DLR Sensor Unit for Planetary Exploration Rovers (SUPER) in the Moroccan desert to show resource‐efficient field testing and make the resulting Morocco‐Acquired data set of Mars‐Analog eXploration (MADMAX) publicly accessible. The data set consists of 36 different navigation experiments, captured at eight Mars analog sites of widely varying environmental conditions. Its longest trajectory covers 1.5 km and the combined trajectory length is 9.2 km. The data set contains time‐stamped recordings from monochrome stereo cameras, a color camera, omnidirectional cameras in stereo configuration, and from an inertial measurement unit. Additionally, we provide the ground truth in position and orientation together with the associated uncertainties, obtained by a real‐time kinematic‐based algorithm that fuses the global navigation satellite system data of two body antennas. Finally, we run two state‐of‐the‐art navigation algorithms, ORB‐SLAM2 and VINS‐mono, on our data to evaluate their accuracy and to provide a baseline, which can be used as a performance reference of accuracy and robustness for other navigation algorithms. The data set can be accessed at https://rmc.dlr.de/morocco2018.


| INTRODUCTION
Planetary surfaces are mostly explored by mobile robotic platforms, because these environments are difficult and expensive to reach for humans and exhibit hazardous environmental conditions. Owing to the interrupted and high-latency communication, as well as the lack of prior knowledge about the environment, such mobile robotic platforms have to operate autonomously to some extent.
The quality of autonomous decision-making by a mobile robot depends heavily on the navigation and mapping software solutions, as precise localization capabilities are crucial for experiments to be completed successfully. The navigation and mapping solutions increasingly rely on vision-based components, as camera systems are commonly used in space missions since they are robust in harsh environments and spacequalified systems already exist. Extraterrestrial vision-based robot navigation needs to be able to operate under optical conditions that differ greatly from most locations on Earth. While a software simulation can bootstrap initial phases of component development and testing, to satisfy the robustness requirements it must usually be tested against inputs with representative optical conditions. The maturity and usability of such navigation solutions therefore depends on the availability of the sensor input representative of the targeted planetary environment for development and testing. These data are typically obtained on Earth in planetary analog locations with optical features similar to the targeted extraterrestrial bodies.
However, planetary analog sites on Earth are generally remote locations (see Preston et al., 2012, for an overview) that are difficult to access. Bringing a robotic system to such areas for onsite testing usually results in costly logistics, comes with high demand in operations personnel, and involves the risk of damaging the robotic system. Offsite testing using previously recorded data sets is a more cost-efficient approach, but suffers from the scarcity of data sets available for this purpose. Furthermore, data sets allow software components to be run repeatedly to improve algorithms and test them more efficiently. Algorithms can also be compared with each other with respect to the accuracy, robustness, and computational performance.
This paper addresses the onsite and offsite testing of visionbased navigation solutions in planetary analog scenarios by considering three aspects: • We discuss the use of hand-held sensor devices that resemble the sensor setup of a planetary rover as a cost-efficient, low-risk alternative to onsite field testing. For this, we present the humanportable Sensor Unit for Planetary Exploration Rovers (SUPER) of the German Aerospace Center (DLR) and its application in the 2018 Mars-analog field test in the Moroccan desert.
• During this field test, we recorded the Morocco-Acquired Data set of Mars-Analog eXploration (MADMAX). It is a comprehensive collection of visual inertial navigation data representative of a Mars-roving scenario, containing 36 trajectories with a combined length of 9.2 km. We describe the data set in detail and make it publicly available.
• We use MADMAX with two state-of-the-art navigation algorithms, ORB-SLAM2 (Mur-Artal & Tardós, 2017) and VINS-Mono (Qin et al., 2018), to evaluate their algorithmic performance and to evaluate the challenges of the data set. We show that the hand-held MADMAX can be considered representative for planetary rover navigation by comparing the navigation results in terms of accuracy with results obtained from a planetary rover prototype. Additionally, we use the results to provide a navigation baseline for our Mars-analog scenario that can be used by other navigation algorithms as a performance reference for accuracy and robustness.

| Hand-held field testing
Testing of planetary rovers and their navigation algorithms can be performed with increasing complexity levels. Tests in laboratory environments and in artificially created outdoor testbeds are a good initial way to validate navigation and mapping solutions. A multitude of rover navigation solutions have been tested in such scenarios, for example, • ExoMars Test Rover (ExoTeR) is used to test localization and mapping components both in an indoor laboratory environment and in an outdoor test-facility of the European Space Agency (Hidalgo-Carrió et al., 2018).
• Rovers Minnie and Mana are used outdoors on artificially created terrain that resembles the features of Mars (Post et al., 2018).
For a state-of-the-art validation of planetary rovers in general, and especially for navigation solutions, field tests in analog environments become necessary. These analog environments are usually remote regions like deserts or volcanoes that resemble the environments of different celestial bodies, typically the Moon or Mars. A detailed list of analog sites on Earth is provided by Preston et al. (2012). In the past, many different planetary rover prototypes were placed in such environments for system testing or testing of the full rover mission including scientific operations. The list of such endeavors known to us comprises: • The long-distance rover traverses in the Atacama Desert in Chile, pioneered by Wettergreen et al. (1999) and Wettergreen et al. (2005) with teleoperation and partial autonomy for the rovers Nomad and Zoë, respectively. And more recently, the long-range autonomous exploration tests by the Seeker rover  in the same area.
• Tests for the ExoMars rover mission, such as the SAFER field test (Gunes-Lasnet et al., 2014) in the Atacama Desert and the ExoFit rover tests (Motaghian et al., 2019) in southern Spain.
• The MARS2013 mission in Morocco, where full Mars-analog operations were tested, including scientists in the field, communication infrastructure, and rovers (Groemer et al., 2014).
• The Utah field trials in the United States testing the SherpaTT rover and Coyote III robot, with a the focus on multirobot systems and teleoperation (Sonsalla et al., 2017) and locomotion capabilities (Cordes et al., 2018).
• The Mojave Desert field test in the United States (Bakambu et al., 2012) with a focus on evaluating navigation algorithms in a Marsanalog environment, specifically considering the visual motion estimation and inertial measurement unit (IMU) enhanced wheel odometry.
• A field test in the Teide Volcano National Park on the island of Tenerife, Spain, the results of which are used, among others, by Geromichalos et al. (2020) to validate a Simultaneous Localization and Mapping (SLAM) solution.
• Visual odometry with omnidirectional cameras is evaluated with data from a field test in the Atacama Desert by Corke et al. (2004).
• Our ROBEX demo mission on top of the volcano Etna in Italy focusing on a full modular Moon-analog mission (Wedler et al., 2017), done by the DLR in 2017.
All these tests campaigns were either a full rover mission, or even a full scientific mission scenario. Generally, planetary rovers are highly integrated mechatronic systems, which makes field testing complex, something we experienced ourselves during the ROBEX demo mission (Wedler et al., 2017). The complexity is due to the multitude of components used, which require the presence of many specialists and equipment at the test site. This results in costly logistics and high manpower requirements. Furthermore, many of these endeavors are used to test various system components of the rover, allocating only limited time for navigation tests and possibly running the risk of technical failures that may delay or endanger the field test.
Our paper's first contribution is to address the complexity of field tests by providing a simplified hand-held rover navigation platform that focuses solely on the visual inertial navigation systems (VINS) components of a planetary rover. This is achieved by our SUPER hand-held device, which resembles a planetary rover in terms of sensors but leaves out all other components, such as locomotion, scientific instruments, and representative communication concepts. As SUPER is not a full planetary rover prototype like the LRU, the representativeness for a rover operations scenario on a celestial body can be questioned. However, our analysis in Section 6.3 suggests that the hand-held approach is representative for navigation. We therefore consider this approach as the optimal trade-off between costs and representativeness. A similar approach is taken by Furgale et al. (2012), where a pushcart platform equipped with stereo cameras, a sun sensor, and inclinometers is used for a combined Mars and Moon-analog navigation experiment on Devon Island, in the Arctic North of Canada.
We use SUPER in a Mars-analog site and perform navigation experiments to show that this kind of sensor unit allows for resource-efficient field testing thanks to three factors: its small size, the use of key hardware components only, and the fact that only two persons are needed to operate the device. In our case, the Marsanalog site is located in the north-western region of the Sahara Desert, close to the city of Erfoud in the Drâa-Tafilalet region of Morocco, as shown in Figure 2.

| Related field test data sets
Only a few publicly available vision-based navigation and mapping data sets exists, which specifically target planetary robotics. One is the Katwijk Beach Planetary Rover Data set (Hewitt et al., 2018), where a planetary rover prototype performs several long-range traverses on a beach using stereo cameras and Lidar. Another is the resulting data set from the previously mentioned experiments on Devon Island, where the pushcart platform was used to record a long trajectory of 10 km (Furgale et al., 2012). We recorded two long range navigation runs on the outskirts of the volcano Etna during the ROBEX campaign, using a lightweight planetary exploration rover prototype that used stereo vision, IMU data, and wheel odometry for navigation (Vayugundla et al., 2018). Lamarre et al. (2020 present recordings from a Mars-analog outdoor laboratory run by the Canadian Space Agency that include data for visual inertial and omnidirectional camera-based navigation. They also consider energybudget-aware navigation by providing solar irradiation data. Lacroix et al. As we deployed SUPER in the Moroccan desert, we use it to record a comprehensive visual inertial navigation data set, which constitutes the second contribution of this paper. The small size and mobility of SUPER allows it to cover several different locations with varying terrains and record a variety of trajectories in a short time. In addition, we use the mobility of SUPER to record data in sites that are not yet accessible for the current generation of wheeled planetary rovers, owing to the harsh terrain. In total, we collect data from 36 experiments at eight different locations in three general areas, each with an individual geological character. The recorded sensor input consists of a monochrome stereo camera pair, a RGB color camera, two omnidirectional cameras mounted in a vertical stereo setup, and an IMU. We also compute the ground truth pose from a Real-time kinematic (RTK) Global Navigation Satellite System (GNSS) with two antennas mounted on SUPER, which allows us to obtain a ground truth not only in the position but also in the orientation.
MADMAX lies in line with the Etna data set (Vayugundla et al., 2018) as we used closely related systems with similar sensors and an identical software infrastructure in both cases. With these two data sets, it becomes possible to evaluate VINS algorithms in Moon-and Mars-analog scenarios at the same time, without having to adapt to a different system setup.
The Morocco data from Lacroix et al. (2019)  | 835 provides omnidirectional stereo images as secondary sensor data.
The wheeled platforms of Minnie and Mana provide a rover-like movement of the system. We exploit the higher mobility of SUPER to access rougher terrain to collect data. These data become relevant for next-generation planetary rovers with improved locomotion capabilities. Lacroix et al. (2019) provide data where trajectories were traversed several times by the rovers; we instead cover more trajectories of varied character in each location. We provide both a five and a six degrees of freedom (DoF) ground truth compared to the three DoF ground truth that is normally included in planetary navigation data sets. Finally, along with our data set we include an evaluation using two state-of-the-art navigation algorithms.
The three mentioned data sets therefore allow the development and evaluation of robust navigation algorithms that are able to perform independently of system architecture or environment.

| Navigation algorithm performance reference
We use MADMAX with two state-of-the-art SLAM-based navigation algorithms, the visual odometry algorithm ORB-SLAM2 (Mur-Artal & Tardós, 2017), and the visual inertial odometry algorithm VINS-Mono (Qin et al., 2018) to evaluate their performance in a Mars-analog scenario as the final contribution of our paper. The variety of MADMAX enables us to test the navigation solutions for optimal scenarios but also challenging corner cases.
We compare the navigation accuracy of this hand-held data set to the results of navigation sequences from an additional SUPER system, this time attached to a rover. This second sensor unit was integrated with the planetary rover prototype illustrated in the background of Figure 1, and was used to record several navigation sequences. We apply identical evaluation methods in both cases to study potential differences on the navigation performance. In the end, this experiment allows us to emphasize that MADMAX can be considered as representative for planetary rover navigation.
In addition, the results from the state of the art can be used as a baseline for other navigation algorithms. Our approach is similar to that of Antonini et al. (2020), however in our case targeted at planetary robotics instead of indoor unmanned aerial vehicle operation.
To the best of our knowledge, no such evaluation and publicly available state-of-the-art performance reference for navigation is available for planetary rover navigation yet.

| Outline
The article is structured as follows. In Section 2, we describe the field test scenario in the Moroccan desert, where we recorded MADMAX.
We present our sensor suite SUPER in Section 3, outline the system specifications, provide details on the installed sensors, and describe the reference frame definitions. We introduce the experiment setup in Section 4 together with operational aspects of field testing, our approach for dual-antenna RTK GNSS ground truth computation, and the sensor calibration. We present the resulting data set with its 36 trajectories in Section 5 and discuss the specific characteristics of the different experiment locations and of the individual trajectories. We provide a detailed overview of all sensor data that can be found in the data set and how the different sensor readings can be related to each other spatially and temporally. We also address challenges that come up in MADMAX, such as influences from the extreme environment, complications that we faced during the testing, as well as lessons learned from the operation. Finally, in Section 6, we discuss the results of two state-of-the-art navigation algorithms with MADMAX and provide a performance analysis of them. There, we utilized the first SUPER in a hand-held approach to record MADMAX. The second SUPER was used in the same area for the final validation experiments of the SRC technology development roadmap F I G U R E 1 The two SUPER units in the Moroccan desert on the Rissani 1 location: One unit is mounted on the SherpaTT rover (Cordes et al., 2018) of the DFKI Robotics Innovation Center (background) and the other is used as human-carried device (foreground). The data presented in this article were captured by the hand-held device [Color figure can be viewed at wileyonlinelibrary.com] PERASPERA with the projects InFuse and Facilitators as a provider of sensoric data for localization, environment mapping, environment reconstruction, and visual tracking (Post et al., 2018). 1 For this purpose, it was integrated with the SherpaTT rover (Cordes et al., 2018)  The data presented in MADMAX is widely varied, as the small size and mobility of SUPER allows it to access several different locations for experimentation in a relatively short time. The region around Erfoud offers a rich variety of terrains: from flat to hilly, from sandy and featureless through pebbly to rocky with features of high saliency, from horizon landmarks being virtually nonexistent to salient landmarks on the horizon, from easily traversable areas to slopes nontraversable by locomotion systems of current planetary robots-such as high-inclination hillsides or sandy dune fields.

| SYSTEM OVERVIEW
The design of SUPER is inspired by the DLR mobile robotic systems, lightweight rover unit (LRU), a four-wheeled full-body-actuated planetary rover prototype, and ARDEA, a microaerial vehicle (MAV). While such competitions and field tests provide good opportunities for system testing and data recording, they come at significant organizational and logistics costs. To minimize costs while maximizing the scientific yield, the design of SUPER is focused solely on planetary rover sensors. The result is a device almost identical to LRU in perception, on-board data processing, and power management capabilities, but one that omits other aspects such as active locomotion and manipulation. See Figure 4, for a detailed view.
As it was already mentioned, SUPER can be used in two scenarios as is shown in Figure  Optionally, two GNSS antennas can be mounted to the sides with a wingspan of 1.28 m. We make use of the two antennas to provide a five DoF ground truth that includes not only the position but also roll and yaw information. Together with the antennas on the body of SUPER, a base GNSS station is installed on each experiment location to eliminate atmospheric delays, thus allowing precise positioning estimates. The computation of the ground truth is discussed in Section 4.3.
SUPER is focused on the perception aspect of planetary robotic applications. To keep the system simple, actuators were excluded from the design. This design choice implies that SUPER does not possess an active pan-tilt unit to change the camera orientation. The cameras point downwards at a fixed pitch of ∘ 28 relative to the body of SUPER. The camera orientation can be actively guided by the carrier-especially its heading angle-and is aligned with the orientation of the carrier. The height of the sensors depends on the height of the carrier and the adjustments made to the harness that is used by the carrier. Generally, during our hand-held experiments, the stereo camera bench is located approximately 1.20 m above the ground.
Furthermore, the hand-held approach implies the absence of wheels, therefore no wheel odometry is available-a sensor input typically present in planetary robotic platforms. The data from SUPER is only targeted at VINS algorithms for navigation and mapping, which can be developed independently from the wheel odometry or other sensor inputs. Wheel odometry is a challenging scientific topic by itself and was omitted for this field test. Interested readers can learn more about our investigation into this topic in Bussmann et al. (2018), where slip of the LRU was investigated on a Moon-analog site.

| SUPER as stereo and VINS
An overview of the sensors of SUPER is given in Table 1. The placement of the sensors together with the relevant coordinate frames is shown in Figure 4.
SUPER is equipped with an optical bench carrying three cameras mounted in a row at the front of the device, with parallel optical axes.
The left and right cameras are monochrome and set apart at a 90 mm baseline. These constitute the stereo camera bench, which is our primary navigation sensor. The color camera is mounted centrally between them. Data from the color camera can be used as an ad- In the context of InFuse, SUPER is referred to as Hand-held Central Rover Unit. The processing pipeline for image acquisition, image rectification, and depth image computation is identical to that of the LRU (Schuster et al., 2019). We use the Semi-Global Matching algorithm (Hirschmüller, 2008) to compute the depth images online onboard. This depth image stream is considered to be an intermediary data product and is included in the MADMAX data set. Note that the depth image computation is adjusted to the relevant working distance, that is, it considers a maximum disparity of 128 px, which relates to a minimum depth of 60 cm.

| Omnidirectional navigation
The configuration of SUPER is easy to modify thanks to its design, which provides mechanical, electrical, and data connections for adding extra components. We use this advantage to include addi-  | 839 solution can be formulated for the upper camera for the absolute orientation. Fourth, the lower camera can use the optical flow of features on the ground for improved navigation. In addition, on a full rover system, the lower camera can be used as a tool for visual inspection of the locomotion system's health. These are a few suggested uses of the omnidirectional stereo data for navigation and hazard-avoidance purposes.

| Reference frames
The relevant reference frames of SUPER are annotated in Figure 4 and are listed in Table 2 with respect to the world, that is, the topocentric frame T . Note that GNSS data only provides positional information and no orientation, however we define the corresponding GNSS frames for the sake of completeness. Section 4.3 gives details regarding raw GNSS data processing to calculate the ground truth of the SUPER pose, that is, orientation and position of the central body frame B with respect to the world frame T . This body frame is located above the IMU frame, precisely in the middle of the two GNSS antennas and is used as a central reference, also for the navigation algorithms in Section 6.
The topocentric reference frame T for each experiment is defined as the respective position of the GNSS base station. Finally, B,start defines the starting point of each trajectory and denotes the position of the body frame at time t 0 , with t 0 being the starting time of the respective experiment. All relevant transformations between these reference frames are included in MADMAX for each experiment.

| EXPERIMENT SETUP
In this section, we discuss our field experiment methodology. The variability of conditions and the limited availability of equipment complicate the data set acquisition during field tests significantly, as opposed to well-defined laboratory environments. To structure the data acquisition process in such conditions and to facilitate follow-up data evaluation, we use predefined procedures for the experiment and the ground truth during all experiments. Intrinsic and extrinsic camera calibration is performed before the first experiments. We record images of the DLR CalDe calibration pattern (Strobl & Hirzinger, 2011)

| Experiment procedures
We apply predefined procedures to each experiment to ensure consistency. We use a static platform as a base for SUPER, where we start and finish each trajectory to ensure that SUPER is placed in the same position and orientation both times. This allows for welldefined trajectory evaluation criteria and could also be used as loop closures for SLAM.
The platform is leveled horizontally and oriented to the east using a spirit level and a compass. This procedure provides a rough initial alignment of the navigation results with the GNSS ground truth and facilitates later processing of the data. The initial seconds (between 8 and 42 s) of each data acquisition run are recorded in a stand-still configuration to obtain static sensor readings for sensor bias evaluation.
At each location, neither the platform nor the GNSS base station move. Therefore all runs from that location have common start and end points in image and ground truth data. This allows for navigation and mapping overlaps between the different trajectories.

| RTK GNSS ground truth
One crucial aspect for field tests is the ground truth. In laboratory setups, the ground truth is usually obtained by high-precision optical • an inertial-independent (GNSS-only) 5 DoF solution, sampled every second, • a GNSS+inertial 6 DoF reference, sampled at 100 Hz. It is obtained by fusing the IMU and GNSS measurements.
We recommend the use of the 5 DoF ground truth solution for an inertial-independent evaluation of visual-inertial algorithms. The 6 DoF ground truth shall be used in the remaining cases, especially for the evaluation of purely vision-based algorithms.
The ground truth estimation solves the navigation problem, for which the position, velocity and attitude of a moving (rigid) body are determined. The kinematic quantities relate two coordinate systems: (i) the frame whose motion is described, body frame B ; (ii) the frame with which that motion is respect to, denoted as topocentric frame T . This study adopts the conventions recommended for rotation and reference frames from Barfoot (2017). Figure 6a provides an illustration of the aforementioned navigation frames.
The state estimate is expressed as a discrete-time state-space model. Thus, at the time t, the state is described by Compared to the estimated baseline length, the attitude precision results mostly below 0.5°for roll and yaw estimates for our data set.
Generally, the GNSS accuracy is denoted in the corresponding SDs in the ground truth data.
Since only two GNSS antennas were installed, attitude determination for the GNSS-only ground truth becomes an ill-posed problem. As a result, roll and yaw can be accurately estimated, while pitch is not observable, thus providing a ground truth in 5 DoF.
We provide the MATLAB code for our two ground truth estimation approaches together with the data set online.

| DATA SET
The focus of our Morocco experiments was to gather relevant planetary-analog data for navigation and mapping that offers a variety, both in the type of the trajectory as well as in the type of the terrain. An overview of the locations (labeled A-H) is given in Table 3, alongside with a brief description of the terrain, images of the scene, and the corresponding plots of the GNSS ground truth trajectories.

| Experiment properties
We made sure that all trajectories follow the same recording procedures, to make the data set more consistent. As outlined in Section 4.2, all tracks are realized in such a ways that the initial and final pose are the same-with heading always to the east, and roll and pitch approximately zero-allowing for at least one loop closure within each.
All runs which are obtained in one location have mutual identical starting and end poses, allowing for at least two overlaps between each pair of tracks for multirun mapping. Furthermore, all trajectories are recorded in such way that they overlap on several additional occasions to allow for combined mapping. For each experiment, we choose from predefined categories of trajectories that represent different aspects of navigation. Trajectories of the mapping type aim to cover an area with many overlaps within one run, trying to allow for dense terrain mapping. Zig-zag trajectories also aim at dense mapping of an area. In this case, a structured grid pattern of motion is used, unlike the unstructured motions of the mapping trajectories. Long-range-navigation runs cover long distances and are targeted towards evaluating localization algorithms. We also record trajectories for homing algorithms that follow one path several times with a minor offset. Finally, exploration runs combine several of the characteristics of the aforementioned types in an unstructured manner. The different trajectory types at each location are listed in Table 3, where the characteristics and mapping overlap of each run can also be seen in the corresponding GNSS ground truth overview plots.

T A B L E 3 Experiment overview: GNSS trajectories, color camera impression of the scenes, and additional information sorted by location. Camera decalibration is indicated with
The operator was instructed to move at a velocity, which is similar the movement speed of current or future planetary rovers and to keep this velocity constant. In the data set, the overall average velocity is at 29 cm/s, while the average velocities of the individual sequences range between 22 and 48 cm/s. F-0, with an average velocity of 12 cm/s, is considered as special case.

| Data set content
We provide sensor data from all sensors listed in Table 1 The ground truth is provided in two ways, as the GNSS-only 5 DoF ground truth, and in addition, as the 6 DoF fusion of GNSS and IMU data. Note that the timestamps of the postprocessed GNSS measurements are temporally synchronized with the other sensor data of SUPER. The raw GNSS data is provided in UTC time according to the RINEX specification, thus the temporal synchronization with the other sensors has to be taken into account. The temporal offset between GNSS time and the UNIX system time is listed in the metadata.yaml file for each experiment. Note that this time offset is different for each day, as SUPER was not connected to clock synchronization servers during the field campaign.
Additionally, the metadata text file lists detailed information for each experiment, like precise location coordinates of the base station, the time stamps of the experiment start, and the start of data recording, respectively. One key information is the initial pose of SUPER with respect to the base station, that is, the transformation from B,start to T .
In addition to the experiment data, we provide calibration data.
This includes the intrinsic and extrinsic camera calibration as callab_camera_calibration*.txt together with the resulting camera parameters for the rectified images as {camera} _rect_info.txt. The transformations between the relevant coordinate frames from Table 2 are provided as well. This is a collection of transforms between a parent and a child frame, given as position and quaternion-orientation in .csv files.
The navigation results for ORB-SLAM2 and VINS-Mono from Section 6 are included as well. They are provided in two formats: The original result data with respect to the camera and IMU frames, respectively, and the data aligned with the GNSS ground truth. Both are text files with timestamp, position, and orientation quaternion for each pose.
Finally, we provide the data set from the SUPER calibration. This includes raw images from each camera with the calibration pattern visible, the specification of the calibration pattern dimensions, as well as the calibration results in a text file.
All data can be freely accessed online at https://rmc.dlr.de/ morocco2018. The website shows details on each experiment location and the experiments performed, plus one section for the calibration data. The data is available individually for each experiment, structured as shown in Figure 7 and is provided in a compressed format.

| Issues, challenges, and lessons learned
Operations in extreme environments pose special challenges to the system, the operators, and the experiments. In our case, several challenges and technical issues were encountered, which we could partially account for.
Many recordings contain optical disturbances that make the data set challenging. One disturbance appeared in particular during afternoon experiments: lens flares due to a low sun position. Another disturbance was the strong over or underexposure of image regions due to shadows in the field of view. The moving shadow of SUPER introduces the similar optical features in the recorded images like a MEYER ET AL.
| 845 planetary rover. Our analysis concluded that the human carrier did not introduce any additional undesired shadows into the image that might disturb navigation algorithms. All of these disturbances were desired to create challenging scenarios for planetary navigation algorithms as a robustness test. Due to the operator having moved slowly and exposure times in bright sunlight having been short, no significant motion blur was observed.
The first issue to mention is the extrinsic calibration of cameras.
Once the field tests had been completed, we evaluated the quality of the extrinsic stereo camera calibration by comparing the vertical displacement of sampled features within selected stereo pairs of each run. It turned out that the last runs of the Morocco campaign, labeled G runs, experienced a vertical feature offset bigger than 2 px, which we consider a sign of decalibration. As a result, the computed depth images for these runs are less accurate and contain several invalid regions. Furthermore, the accuracy of camera to IMU calibration is degraded in these runs. The cause for this calibration error is most likely an unexpected mechanical load during transport to the experiment site. Nevertheless, we publish the G runs so that the data can be used to test the robustness of algorithms against extrinsic decalibration. Indeed, the Section 6 shows that VINS-Mono and ORB-SLAM2 obtain accurate navigation results for the G runs. All other runs turned out to have accurate calibration. For future field tests, we recommend calibrating cameras, and IMU-to-camera, on a daily basis to ensure high data quality.
Throughout the field test, we experienced network problems that specifically affected the stereo cameras connected via gigabit Ethernet (GigE vision®). As a result, several frame drops occurred.
These frame drops usually lasted for one to four consecutive frames (up to a quarter of a second) and seldom reached half a second, that is, up to eight consecutive missing frames. Our analysis shows that this still accounts for an inter-frame overlap of 80%-90% and 70%-80%, respectively.
Most runs experience frame drops of only 5%-10% of the overall frame count. However, the F runs are strongly affected with a loss of 15%-19%. We attribute the issue to the network hardware used in SUPER, which was chosen due to its lightweight design.
Reconfiguring the network settings made the issue less prominent, but the general problem still prevailed. Generally, no direct correlation was found between the number of frame drops and poor navigation results, which we discuss in Section 6 in detail. The individual losses per run are listed together with the data. To overcome frame drops in the future, a more robust network setting has to be considered, even though this would require more heavyweight components to be used.
Finally, our GNSS solution lost precision in its measurements occasionally (we consider position measurements with a SD of more than 0.06 m to be imprecise) for two reasons: • The RTK GNSS quality depends significantly on having a precise geo-reference solution of the base station. During the G and H runs, we recorded the base station GNSS data for intervals that were too short to obtain a sufficiently precise GNSS base station solution, thus leading to poorer accuracy in the corresponding pose estimation of SUPER. For future experiments, a prolonged data recording for the GNSS base station should be considered in the experiment schedule.
• During some experiment runs, the SUPER antennas lost the signals of several GNSS satellites for a few seconds. On such occasions, the precision was usually good enough to provide a satisfactory position estimate, but the orientation suffered significantly.
The GNSS inaccuracies occurred in 11% of all measurement points, not counting the runs B7, C2, and F0, which were more strongly affected with rates exceeding 40%. The accuracy of the measurements is represented in the GNSS pose estimate SD for each timestamp. Any algorithm or any evaluation that also considers the associated uncertainties should not be affected by this issue.

| EVALUATION WITH STATE-OF-THE-ART NAVIGATION ALGORITHMS
We evaluated the data set using two state-of-the-art SLAM-based navigation algorithms. The algorithms provide us with a 6D-pose of the SUPER system, which is subsequently compared with the GNSS ground truth. The motivation is to provide the navigation results and insights as a baseline against which other algorithms can be compared, and we invite interested researchers to do so.

| Evaluation results
MADMAX is a large-scale data set that provides suitable sequences to test stability and long-term use of SLAM. Notice that results and evaluation shown in this section do not aim to compare performance specifically between ORB-SLAM2 and VINS-Mono but to evaluate general differences between stereo and visual inertial setups using the selected algorithms as respective examples. Their performance is also used to show the opportunities provided by the data set in the sense of navigation algorithm testing. Additionally, the evaluation aims to provide a navigation baseline for the respective category, which can be used to benchmark other algorithms. The SLAM algorithms compute 6D-poses for every frame of the sequences that we compare with the GNSS ground truth. Next, we will list the details of our evaluation: • Both systems have been tested using half-resolution images from the monochrome cameras to achieve real-time performance using our institute computers (Intel Xeon E5-1630, 3.70 GHz, 16 GB RAM, CPU-only computation).
• We initialize VINS-Mono with an estimation of the extrinsic calibration parameters from the initial calibration and let the system refine them online.
• The association with the GNSS data has been performed by only considering GNSS measurement points with a SD lower than 0.06 m.
• For evaluation, we use the absolute trajectory error (ATE) and the relative pose error (RPE), as proposed by Sturm et al. (2012).
• We consider the fully optimized trajectories that use all data available at the end of each run.
• We use ORB-SLAM2 and VINS-Mono with loop closing and relocalization capabilities enabled for each individual sequence, but without map reuse between the runs.
The ATE calculates the root-mean square error (RMSE) of all global positions p t along the frames of the estimated trajectory with respect to the GNSS ground truth correspondences p t , after both curves have been aligned using the method from Horn (1987). The resulting error at timestep t is and the overall ATE is for a total of N trajectory segments.
The RPE computes the RMSE of the difference of traveled distances between the estimated trajectory and the ground truth.
The traveled distance between two frames separated temporally by Δt is defined as = || − || +Δ p p d t t t t and the resulting RPE as where we choose Δ = t 1 s. The step-wise RPE is therefore Note that while on the one hand, ATE computes the absolute difference between the two trajectories in meters, RPE evaluates the average pose drift in meters per second.
Since, on occasion, one of the SLAM approaches may not be capable of calculating the complete trajectories, for a fair comparison, we use these metrics just when at least 75% of the trajectory traveled distance has been accomplished successfully.
The exemplary results for four trajectories of SUPER computed by ORB-SLAM2 and VINS-Mono are illustrated in Figure 8 together with the GNSS ground truth.

| Navigation robustness
To evaluate the navigation robustness, the estimated percentage of accomplished trajectories is shown in Figure 9 for each sequence. It can be seen how the visual inertial navigation is more robust than stereo since it finishes most of the sequences, a significantly greater number than ORB-SLAM2. It turned out that the frame drops in the recordings, as mentioned in Section 5.4, do not have a direct correlation with the navigation robustness of the algorithms. This is shown, for example, by the run F2 (also shown in Figure 8), which has one of the highest frame drop rates at 19% but VINS-Mono and and D2 in Figure 8 show the case of loss of tracking for ORB-SLAM2, where relocation succeeds for the D2 run and fails for the F3 run.

| Navigation accuracy
Leaving aside the fact that the visual inertial algorithm manages to complete more sequences than its stereo-based counterpart, it also performs slightly better in terms of ATE accuracy, as shown in Figure 10. Evaluating the 16 sequences in which both algorithms reach more than 75% completion, VINS-mono outperforms ORB-SLAM2 in 10 sequences, whereas ORB-SLAM2 performs better in six runs.
Nevertheless, ATEs for both systems are within the same range.
On the other hand, ORB-SLAM2 outperforms VINS-mono in 15 out of the 16 sequences in terms of RPE as shown in Figure 11. Generally, it Nevertheless, the resulting navigation results of both algorithms can be considered as accurate, see for example, the trajectories of run F2, which is shown in the Figure 8. Until tracking is lost, ORB-SLAM2 also provides accurate navigation results for the sequences F3 and D2 (Figure 8). The same figure also shows the G0 run, which is one of the three runs with extrinsic decalibration. It is clear that both algorithms cope with such decalibration and provide reliable navigation results. Apart from the obvious advantages of visual inertial SLAM versus stereo SLAM in terms of robustness for outdoor environments with long-term trajectories, we have not been able to observe any major differences between the two state-of-theart SLAM pipelines.

| Comparison of hand-held and rover-based navigation
Finally, we investigate how representative the hand-held data is for planetary rover navigation, answering the question of whether human-induced motions negatively affect the navigation algorithms.
We take seven navigation sequences that were obtained by the rover-mounted SUPER unit and test these using ORB-SLAM2 and VINS-mono. We apply identical evaluation methods to those in Section 6.2. Note that these data belong to the InFuse project (Post et al., 2018) and are therefore not included in MADMAX.
The seven sequences are 26-54 m in length. The trajectories are mostly straight drives combined with wide curves and took place in, and close to, the area of the C runs. Therefore, both MADMAX and these seven rover based sequences experience close to identical environmental conditions. The average velocity of the rovermounted SUPER is 12 cm/s, about a third of the velocity of the handheld navigation (at 29 cm/s). The position of the stereo cameras is approximately 0.80 m above the ground.
The main difference in both data sets is therefore the different type of movement, where the human-induced motions may influence the navigation in a negative way. Recall that we experienced no motion blur in neither of the two data sets, thus limiting the difference solely to the type of motion. We make the claim that MADMAX can be considered a representative data set for rover navigation if the ATE and RPE of the experiments match the errors of the rover-mounted SUPER experiments. We consider the C runs and the D-0 to D-2 runs for a comparison, as these were obtained in the same location or feature similar types of trajectories, respectively. Recall, that the RPE depends on the experiment velocity, as it is the distance error per time as stated in (5). We therefore expect it to be lower by a factor of three for the rover-bound experiments.
Regarding the rover-based navigation, ORB-SLAM2 completes six sequences, except for run 3, whereas VINS-mono completes all seven runs. Figure 12 shows the respectiv navigation results in terms of ATE and RPE. Indeed, the ATE lies in the same range as the comparable runs from MADMAX, generally around 0.5 m with peaks at 2-3 m. The RPE is approximately one-third compared to the hand-held runs, which is expected owing to the three-fold difference in velocity. We therefore conclude that motions from the human-based transportation do not negatively influence the navigation. This indicates that MADMAX consists of representative planetary rover navigation data, supporting our case in favor of hand-held field testing.

| CONCLUSION
In this paper, we presented a field testing approach for planetary robotics navigation and mapping algorithms and test data recording that fills the gap between laboratory tests and complex, full-roversystem field tests. To do this, we deployed a compact hand-held sensoric abstraction of a planetary rover-SUPER-in a Mars-analog environment in the northern Sahara in Morocco. The result of the field test is the comprehensive Mars-analog VINS data set MADMAX that we make publicly available.
This data set includes recordings of monochrome stereo cameras, a color camera, two omnidirectional cameras in a vertical stereo setup, and an IMU. The experiments took place in several distinctive locations, and we outlined the variety and character of the different experiments. We discussed several operational aspects that turned out to be crucial for a successful data set recording, such as the ground truth computation of position and orientation from the GNSS data, procedures for data recording, and the calibration of five different cameras relative to each other, including the two omnidirectional cameras.
Finally, we showed that the recorded data can be used for navigation by applying the state-of-the-art algorithms ORB-SLAM2 and VINS-Mono. We evaluated their performance for this planetaryanalog setting, showed their mostly high accuracy, but also revealed corner cases were these algorithms fail. We compare the performance of the algorithms to a rover-based data set and show that our hand-held approach does not negatively influence the accuracy of the state of the art.
It became apparent that MADMAX is a challenging data set for planetary navigation which can be used as robustness test and performance reference for new navigation approaches. We make the data publicly available and provide detailed information about it to facilitate the use of the recordings.