Online Mapping and Motion Planning under Uncertainty for Safe Navigation in Unknown Environments

Safe autonomous navigation is an essential and challenging problem for robots operating in highly unstructured or completely unknown environments. Under these conditions, not only robotic systems must deal with limited localisation information, but also their manoeuvrability is constrained by their dynamics and often suffer from uncertainty. In order to cope with these constraints, this manuscript proposes an uncertainty-based framework for mapping and planning feasible motions online with probabilistic safety-guarantees. The proposed approach deals with the motion, probabilistic safety, and online computation constraints by: (i) incrementally mapping the surroundings to build an uncertainty-aware representation of the environment, and (ii) iteratively (re)planning trajectories to goal that are kinodynamically feasible and probabilistically safe through a multi-layered sampling-based planner in the belief space. In-depth empirical analyses illustrate some important properties of this approach, namely, (a) the multi-layered planning strategy enables rapid exploration of the high-dimensional belief space while preserving asymptotic optimality and completeness guarantees, and (b) the proposed routine for probabilistic collision checking results in tighter probability bounds in comparison to other uncertainty-aware planners in the literature. Furthermore, real-world in-water experimental evaluation on a non-holonomic torpedo-shaped autonomous underwater vehicle and simulated trials in the Stairwell scenario of the DARPA Subterranean Challenge 2019 on a quadrotor unmanned aerial vehicle demonstrate the efficacy of the method as well as its suitability for systems with limited on-board computational power.


Introduction
Autonomous robots have been increasingly employed to assist humans notably in hazardous or inaccessible environments in recent years. Examples include rescue missions in disaster response scenarios (Murphy et al. 2008), in-water ship hull (Hover et al. 2012) and wind turbine quality assessment inspections (Morgenthal and Hallermann 2014), underwater archaeology (Bingham et al. 2010), and deep underwater and space exploration (Whitcomb et al. 2000;Galceran et al. 2015;Katz and Some 2003), among many others. A fundamental requirement for a robot engaged in any of these applications is to be adept at navigating autonomously through highly unstructured and hostile environments. However, this is not a trivial task due to limited or complete lack of prior knowledge about the environment in which the robot has to operate. This implies that the robot has to base its decision making on on-board sensors despite their limited accuracy. In addition, the robot itself might suffer from poor localisation, as well as restricted and uncertain manoeuvrability. Therefore, even though challenging, it is essential to jointly consider all these motion and sensory constraints as well as their associated uncertainties, when planning for navigation actions. This problem becomes particularly more challenging in safetycritical missions where the safety of the robot must be ensured at all times.
Although there exist alternative methodologies addressing each of the above-mentioned issues individually, limited attention has been devoted to the autonomous navigation problem in unknown environments as a whole. The classical algorithms known as simultaneous localisation and mapping (SLAM) enable a mobile robot to concurrently build and use a map to estimate its location (Durrant-Whyte and Bailey 2006). These algorithms rely on identifying distinctive landmarks which can bound the uncertainty of both the environment representation and the robot localisation. Nonetheless, even for scenarios rich in features, there are always some residual uncertainties. More recently, online motion planning frameworks have been developed to empower a mobile robot to compute navigation actions in unexplored environments while accounting for the system's motion capabilities, e.g. (Hernández et al. 2016;Ho et al. 2018;Hernández et al. 2019;Vidal et al. 2019;Youakim et al. 2020). These approaches, however, do not cope with any source of uncertainty and employ ad-hoc heuristics which lack quantified safety guarantees. The few attempts to ensure safety through probabilistic methods, such as (Strawser and Williams 2018;Janson et al. 2018;da Silva Arantes et al. 2019), are generally computationally expensive, are built on strong assumptions, and commonly suppose a complete prior knowledge of the surroundings. Therefore, they are unsuitable for applications requiring online computations or dealing with unknown environments.
In this context, our previous framework guaranteed (in compliance with a user-defined minimum probability of safety) the robot's safety when navigating through unexplored environments (Pairet et al. 2018). The underlying strategy consisted of an iterative mapping-planning scheme capable of continuously modifying the vehicle's motion plan towards a desired goal according to the incremental environmental awareness. At any time, the resulting motion plan was guaranteed to be feasible and safe in face of localisation, mapping and motion uncertainties. This was achieved by incrementally encoding the vehicle's surroundings as an uncertainty-aware map, and by planning feasible trajectories (according to the system's kinodynamic constraints) over this representation, which provided probabilistic safety guarantees by taking into account the uncertainty on the system's localisation and motion, as well as the uncertainty on the environment awareness.
Despite the satisfactory results achieved in our previous work, the framework had some limitations. Foremost, the framework was exclusively tailored to cope with twodimensional (2D) workspaces. The performance of the mapping-planning scheme and its constituent components would scale poorly when dealing with scenarios and systems of higher dimensionality. Furthermore, in-depth discussion, empirical analyses, and thorough experimental evaluation of the framework's key components were lacking. Finally, experimental validations were limited to scenarios with symmetric underwater structures, thus not illustrating the framework's capability to navigate through more challenging environments. Bearing the previous framework limitations in mind, this manuscript presents an extended and improved version of our framework that attempts to overcome these limitations.
While preserving the general mapping-planning scheme of our previous work (Pairet et al. 2018), the main contributions of this manuscript are: 1) the extension of the planning strategy to a multi-layered architecture, allowing for rapid search of a solution in high-dimensional belief spaces while preserving asymptotically optimal and completeness guarantees, 2) the reformulation of the probabilistic collision checking routine, enabling the planner to efficiently evaluate the validity of a state subject to uncertainties and to trade the tightness of the safety bound for computational efficiency, while accounting for the tail events, and 3) the rigorous theoretical development and thorough experimental evaluations of the key constituent components of the framework, as well as the framework as a whole. All in all, these novel advancements allow for faster online motion planning and more efficient evaluation of uncertainties. Consequently, the framework enables the computation of navigation actions online for high-dimensional systems and more challenging unknown environments while still providing safety guarantees. To the best of the authors' knowledge, this is the first generic architecture capable of jointly dealing with kinodynamic and probabilistic constraints in unknown environments online. Both the precedent and new framework are analysed and compared in multiple scenarios with different interesting real-world * and simulated † physical systems. The experimental results demonstrate the suitability of the proposed method to address the challenge of probabilistically-safe autonomous navigation in unknown environments while being suitable for systems with limited on-board computational power.
The remainder of this manuscript is organised as follows. Section 2 provides a comprehensive review of the literature and the corresponding contribution of this paper. Then, Section 3 formally defines the considered problem. In Section 4, an overview of the framework is presented, and then the mapping and planning components are detailed in Sections 5 and 6, respectively. The description of the framework is followed by a thorough analysis of its key constituent features and its performance and capabilities as a whole in Section 7. Finally, the paper concludes with a discussion in Section 8.

Related Work
This section gives a brief overview of prior work on planning under kinodynamic constraints and planning under uncertainty, as well as frameworks for online mappingplanning. Finally, this section discusses all contributions of this work with respect to the latest related literature.

Planning under Kinodynamic Constraints
Planning under kinodynamic constraints deals with the challenge of computing trajectories which are feasible according to the vehicle's motion capabilities. This problem is commonly formulated as finding a trajectory between two points through the system's state space. The robotics literature offers various approaches to tackle this problem.
One strategy is to represent the continuous state space as a lattice space, i.e., a graph where edges correspond to a reduced set of precomputed motion primitives. Then, the motion planning problem can be efficiently solved using graph search algorithms. For the particular case of a carlike system, the motion primitives can be defined as a set of lines and arcs to build a geometric state lattice (Dubins 1957;Reeds and Shepp 1990). These approaches can find the shortest path, but the transition between segments presents abrupt changes in angular velocity, which could only be achieved by a system capable of infinite angular acceleration. More complex lattice space definitions allow the consideration of more restrictive concatenation rules and richer sets of primitive motions, e.g. (Frazzoli et al. 2005;Pivtoraiko and Kelly 2011), at the cost of more memory usage and more computationally expensive queries. Even though planning in lattice spaces has proven to be suitable for many applications, it requires the crafting of a set of motions such that the resulting lattice offers, at least, one suitable solution to the planning problem. Some works in the learning community have addressed this difficult and timeconsuming task with data-driven techniques (De Iaco et al. 2019). However, the resulting set of motions still represents a very limited range of the real dynamic capabilities of the robot. This is undesirable in applications where the environment is not known in advance, and where having the entire dynamic range of motions available for planning can be critical to finding a suitable solution. Moreover, accurate lattice-based methods struggle with high-dimensional state spaces due to the required grid-like discretization.
To deal with kinodynamic constraints, sampling-based motion planners offer great opportunities, e.g., (Kavraki et al. 1996;Hsu et al. 1997;LaValle and Kuffner Jr 2001). Most sampling-based planners, however, lose their asymptotic optimality guarantees when a steering function does not exist in the system's kinodynamically constrained state space. To cope with this limitation, there are different assumptions and heuristics that can be applied at the expense of longer computational times. For example, Webb and van den Berg proposed a version of the asymptotic optimal RRT (RRT*) which can deal with kinodynamic constraints of systems with linearisable dynamics (Webb and van den Berg 2012). If the system's dynamics are not linearisable, asymptotic optimality can be obtained in any planner by augmenting the dimensionality of the state space to account for the search cost (Hauser and Zhou 2016). However, this strategy implies solving the planning problem repeatedly to improve the cost of the solution at each iteration, consequently being unsuitable for applications with online requirements. Finally, the stable sparse RRT (SST) planner offers asymptotically near-optimal guarantees by means of a shooting approach, which consists of expanding the tree from the node with the lowest cost within a neighbourhood of pre-defined δ-radius (Li et al. 2016).
Planning in high-dimensional spaces with multiple constraints poses a challenge for classical planners and typically result in long computation times if a solution can be found at all. In such problems, a common approach to boost performance is via a multi-layered planning scheme.
The key idea is to leverage from a lead to guide (warmstart) the search. In this regard, an interesting approach is the incremental trajectory optimization for motion planning (ITOMP) algorithm, which interleaves planning and optimisation; the planner is given a fixed time budget to find a solution, which is then used as a warm-start for the optimiser (Park et al. 2012). Work (Plaku et al. 2010;Plaku 2015) introduced a synergistic three-layered planner: the high-level planner uses discrete search to initially determine those candidate regions (from a decomposed representation of the environment), which might contain part of the final solution; a low-level planner employs a sampling-based motion planner to find a solution; a middle layer updates the candidate regions according to the considered constraints.
However, the proposed combination of planners does not guarantee asymptotic optimality, and the discrete planner becomes slow for high dimensional problems. Palmieri et al. presented the Theta*-rapidly-exploring random tree (RRT) scheme, which first uses the Theta* path planner to compute a lead path, which is then employed to bias the search of the RRT planner (Palmieri et al. 2016). This approach, however, lacks asymptotic optimality guarantees given that the second planner is an RRT. More recently, a multi-layered approach based on the RRT* as a lead planner and the SST as the final planner has been proposed in . The final planner's search space is strictly constrained around the lead path, raising concerns about the completeness guarantees of the overall architecture.

Planning under Uncertainty
An essential capability for any autonomous robot is to operate in the presence of uncertainty (Dadkhah and Mettler 2012). Sources of uncertainty relevant to autonomous systems fall into four types (LaValle and Sharma 1995): • Uncertainty in localisation: the robot's location is uncertain with respect to the environment. This issue is particularly critical in robots operating in GPSdenied environments, or for systems suffering from low-accuracy state estimation. • Uncertainty in motion (dynamics): the future robot state cannot be predicted accurately, either because of discrepancies between the considered and the real system's dynamic behaviour, or due to limited precision in the system's command tracking. • Uncertainty in the environmental awareness: the robot has inexact or incomplete information about its surroundings (e.g. obstacle location). This issue can arise from inaccuracies in the a priori map, or imperfect and noisy exteroceptive sensory capabilities. • Disturbances in the operational environment: the robot is subject to external factors, such as wind, atmospheric turbulences or water currents, which make the robot deviate from the planned trajectory, thus compromising the reliability of deterministic path planning techniques.
This section scrutinises relevant planning strategies dealing with any of the three first sources of uncertainty. One approach that is popular among existing planners is based on discrete Markov processes. This strategy models the evolution of the system in the environment and generates a policy over the approximated Markov states. Examples of such motion planners include stochastic motion roadmap (SMR) (Alterovitz et al. 2007) and incremental Markov decision process (iMDP) (Huynh et al. 2012). These methods have shown to be effective and provide optimality guarantees in terms of probability of reaching a desired goal; however, they assume perfect knowledge about the environment. Works such as (Luna et al. 2014) have extended these techniques to partially unknown environments. Nonetheless, their large computational times remains the main hurdle in applications with fully-unknown environments or requiring online planning.
Another approach to deal with uncertainties in planning is by means of feedback controllers and sampling-based planners. Van Den Berg et al. proposed the linear quadratic Gaussian (LQG) motion planning method, which finds the best path simulating the performance of LQG on all extensions of an RRT (Van Den Berg et al. 2011). This idea was later applied in roadmaps to propose the feedback-based information roadmap (FIRM) (Agha-Mohammadi et al. 2014). This method, though, relies on full a priori awareness of the environment to explore the belief space offline, to then quickly perform queries online. Consequently, this strategy is not suitable for planning applications where the a priori information about the environment, if available, is not fully informative. Alternatively, Sun et al. presented the high-frequency replanning (HFR) architecture, a strategy which leverages from an LQG and a multi-thread RRT, allowing to continuously replan in the face of alterations in the robot or environment space, while accounting for uncertainties. However, the asymptotic optimality guarantees of such a method can only be assured if a multi-threaded implementation is realised.
An alternative approach to dealing with uncertainty is the chance-constraint strategy. In these methods, instead of maximising the probability of success, the objective is to find a path that satisfies a minimum safety probability constraint. The challenge in incorporating this method in planners lies in the computation of the safety probability over plans. In (Blackmore et al. 2011), linear chance constraints are combined with disjunctive linear programming to perform probabilistic convex obstacle avoidance. This concept was extended and integrated into a sampling-based planner, leading to the chance constrained RRT (CC-RRT) (Luders et al. 2010) and the CC-RRT* (Luders et al. 2013). These approaches evolve the system's dynamics in an openloop fashion, hence growing the uncertainty unboundedly forward in time. To improve accuracy, linear chanceconstraints was applied after propagation of the system's state conditioned on the precedent states being collision-free (Patil et al. 2012). Such a strategy is commonly referred to as truncating the distribution estimating the system's state, and its usage in planning led to the CC-RRT*-D planner (Liu and Ang 2014). The advantage of chance-constraint-based methods is that satisfying plans can be computed quickly, making them desirable for online applications. They are, however, built on strong assumptions which result in overly conservative calculations, and rely on the prior knowledge of a convex environment. Nonetheless, chance-constraint methods are still one of the most widely used strategies in the planning community to deal with localisation, motion and environmental uncertainties, e.g., (Strawser and Williams 2018;da Silva Arantes et al. 2019).
In recent years, planners based on various discretization methods have been developed to deal with limited computational power or online planning requirements in face of uncertainty. Majumdar and Tedrake proposed a precomputed library of funnels to efficiently estimate the system's kinodynamic and uncertainty propagation in three-dimensional (3D) environments (Majumdar and Tedrake 2017). However, library-based approaches consider a reduced set of the real system's capabilities which can endanger the efficacy of the planner. Another approach in favour of performance consists in approximating the computation of the probability of collision to a discrete support (Strawser and Williams 2018;Pairet et al. 2018). This strategy truncates the infinite expand of the belief in a bounded patch considered to contain a large portion of the belief's probability mass. In our previous work (Pairet et al. 2018), all uncertainties were projected onto a discrete support, referred to as kernel, whose resolution resembled the optimal one for online mapping applications. Although considering a discrete support for the computation of the probability of collision allows for quick calculations, none of the works using such technique actually normalises the calculations for the probability mass laying outside the patch, i.e., tail events. Therefore, they cannot offer guarantees on the compliance of the probabilistic safety constraints.

Frameworks for Online Mapping-Planning
Limited attention has been devoted to the online mapping and planning problem as a whole, especially in the face of uncertainties. Current frameworks in the robotics literature are built on strong assumptions which could endanger (or completely neglect) some of the essential requirements for safe navigation in undiscovered environments. Some of the prerequisites are the ability to create an uncertainty-aware representation of the environment, such that uncertainties about the environment can be considered at the planning stage. It is also crucial to ensure completeness guarantees, i.e. the ability of finding a solution if one exists, and among many others, being capable of guaranteeing the vehicle's safety at any time during the mission. Ideally, an online mappingplanning framework should be able to find paths quickly while offering asymptotic optimality guarantees.
The common strategy for online mapping and planning is to continuously replan in the face of changes in the robot's pose or the environment awareness. Scherer et al. endowed an UAV with the capability to map online with an occupancy probabilistic grid, to then guide itself towards the goal with a combination of global and local potential field-based planners (Scherer et al. 2008). Along this line, navigation in 3D environments by mapping from stereo vision and planning with the RRT was considered in (Andert et al. 2011). The resulting paths of these approaches do not account for kinodynamic constraints nor safety guarantees. Alternatively, in (Lin and Saripalli 2014), the local planner of the RRT approximated an UAV capabilities by 3D Dubins paths. Nevertheless, none of these approaches considers any of the multiple sources of uncertainty in the mapping nor the planning stage, thus not providing any theoretical performance or safety guarantees.
More recently, (Ho et al. 2018) proposed an online framework to build an uncertainty-aware map and plan over it using the RRT. However, the resulting paths do not meet kinodynamic nor safety constraints. Instead, proposals in (Hernández et al. 2016 presented an online framework to plan paths under motion constraints for AUVs, but their approach assumes zero uncertainty. Whilst their framework succeeded in solving start-to-goal queries in unexplored real-world environments, their planner used adhoc heuristics to estimate the risk associated with the solution path, and approximated the system's dynamics with Dubins curves. A similar framework is employed in , where the SST planner is employed to propagate an approximated dynamical model of the system. To counteract the computational expenses, the SST is provided with a subregion of the state space drawn from a lead path. Finally, (Youakim et al. 2020) presented a multirepresentation, multi-heuristic A* planner capable of jointly dealing with the requirements of mobile-base and manipulation planning in unknown environments while accounting for localisation uncertainty via heuristics. Despite all methods have been tested in real-world environments, the underlying frameworks lack of theoretical analysis and do not provide a measure of robustness or quantified safety guarantees.

Closely Related Contributions
An early version of the work presented in this manuscript has appeared before (Pairet et al. 2018). This consisted of a simpler framework that proved to be suitable for real-world motion planning problems, but its applicability was strictly limited to underwater robots operating at constant depth, i.e., 2D workspaces. This motivated the development of this follow-up work to extend the framework's capabilities to suit the requirements of a larger group of robotic systems and environments. Given the precedent efforts by the authors, this manuscript provides the following contributions: • An online mapping-planning framework that guarantees the robot safety during navigation tasks in unknown environments (see Section 4). The framework now decouples the calculation of environmental uncertainties from the probabilistic collision checking routine, thus boosting the overall planning speed. • A rigorous explanation of the mapping strategy using local submaps and the process of efficiently retrieving the environmental uncertainties in form of an uncertainty-aware map (Section 5). These calculations now consider probabilistic map fusion to deal with the overlapping local submaps. • An extension of the previous single-layered planner to a multi-layered scheme to guide the search, thus improving the planner's performance (Section 6). Kinodynamic constraints and probabilistic safety guarantees are still met. The probabilistic completeness and asymptotic optimality guarantees are also preserved. • The formulation of a rapid probabilistic collision checking routine subject to a controllable confidence level α (Section 6.3). Adjusting α allows to trade the tightness of the safety bound for computational efficiency, while correcting for the tail events (i.e. the probability mass excluded by the confidence level). • A thorough evaluation of the framework's key constituent components and the framework as a whole (see Section 7). This assessment considers extended analysis and additional experiments including simulations on different dynamical systems, and robot deployments on challenging real-world environments.

Problem Formulation
In this work, the focus is on the challenging problem of safe autonomous navigation in unexplored environments for a mobile robot. To start with, the robotic system must be capable of perceiving and creating a consistent representation of the surroundings despite its potentially Symbol Description occupancy probability at x F X cumulative map over X Planning b start estimated system state at start X goal goal region in state space B goal goal region in belief space p goal minimum probability of goal p safe minimum probability of safety ∆T M P overall planning budget time ∆T L lead planner budget time ∆T C constrained planner budget time ξ lead geometric path X lead region in state space ξ feasible and safe trajectory uncertain localisation. The perceived surroundings must be encoded efficiently such that the robot can exploit it online for planning purposes. Besides the mapping requirements, the process of planning navigation actions towards a desired goal is challenging by itself. The robot must not only account for its limited and uncertain manoeuvrability, but also for the evolving awareness and uncertainty of the surroundings as the robot moves. This section provides formal definitions for these uncertainties and the problem of safe autonomous navigation in unexplored environments. The nomenclature used through this manuscript is summarised in Table 1.

Motion Uncertainty and Constraints
Consider a mobile robot that operates in a workspace W ⊂ R nw , where n w ∈ {2, 3}, under motion uncertainty. The uncertainty in the robot's motion can be due to many reasons, e.g., unmodelled dynamics or noise in actuation, and can be described in several ways. In this work, inspired by (Nguyen-Tuong et al. 2009;Le Ny and Pappas 2009;Hemakumara and Sukkarieh 2013;Beckers et al. 2017), the evolution of the uncertain robotic system is assumed to be given by a Gaussian process. That is, the robot state x ∈ X ⊆ R nx at every time step k is described by a Gaussian distribution, i.e.: where b k is referred to as the belief of x k and is fully defined by meanx k and covariance Σ x k . The set of all beliefs is called the belief space and denoted by B. Intuitively, B is an uncertain representation of the state space X .
Meanx ∈ X ⊆ R nx is the nominal state of the robot and evolves according to: where u ∈ U ⊂ R nu is the control input, and f : X × U → X captures the nominal (known) dynamics of the robot. Covariance Σ x ∈ R nx×nx >0 describes the uncertainty around the nominal robot state and evolves according to: where g : is the covariance function. Examples of Gaussian process representations for robots with linear, unicycle, and fixed-wing dynamics are provided in Appendix A. Methods for modelling robots with (partially) unknown dynamics as Gaussian processes are discussed in (Nguyen-Tuong et al. 2009;Jackson et al. 2020).

Environment Uncertainty
Some applications in robotics lack a complete awareness of the environment, either because there is no information of the surroundings or due to the presence of dynamic elements in the workspace. This work scopes the mapping requirements to undiscovered static environments. In order to reveal the obstacles in the environment, the robot is equipped with exteroceptive sensors, such that it can autonomously explore the surroundings as it moves, i.e., to integrate into the map the obstacles when they are inside the sensor's detection range. Importantly, most sensors uniquely detect points on the boundary of a nearby obstacle.
This work assumes no uncertainty in the robot local observations denoted by h k . To transform this local observation from the robot frame to the global frame, let h k ∼ N (ĥ k , 0). Bearing in mind that the robot's location might be uncertain with respect to the global frame b k ∼ N (x k , Σ x k ), the observed point is represented in the global frame as: is the result of the Gaussian relationships via a compounding operator ⊕ explained in Appendix B. From these uncertain points x O , the robot constructs a probabilistic map M. Then, the obstacle occupancy probability for point x ∈ X denoted by F X (x) is the sum of the normally distributed densities in M. The cumulative sum over all space X is called cumulative map and denoted by F X .

Probabilistic Safety Guarantees
The system's uncertainty and the environment's uncertainty are jointly considered to guarantee the vehicle's safety. More specifically, the probability of the system being in collision with an obstacle in the environment at time k is characterised by: where F X (x) is the cumulative obstacle occupancy probability, as introduced in Section 3.2. Then, given a minimum probability of safety p safe , we require 1 − p collision (b, M) ≥ p safe for every belief b on the trajectory in order to probabilistically guarantee the robot's safety.

Planning Problem
Therefore, the planning problem considered in this work seeks a dynamically feasible trajectory in the belief space B which is probabilistically safe. Formally, let B goal ⊂ B denote the set of all belief states that correspond to the desired goal region X goal in the environment as: where, and p goal is the minimum probability that a belief must satisfy for being considered to be in the goal region. Then, the constrained planning problem is to compute a sequence of controls u 0 , u 1 , . . . , u T −1 ∈ U that result in a dynamicallyfeasible trajectory ξ : [0, T ] → B for the robotic system described by (1), (2), and (3) such that ξ(0) = b start ∈ B, i.e. the system estimated state at the beginning of the mission, ξ(T ) ∈ B goal , and 1 − p collision (ξ(t), M) ≥ p safe for all t ∈ [0, T ].

Framework for Online Mapping and Motion Planning
This manuscript presents a framework that endows a robotic system with the capability of safely navigating through unknown environments. This is achieved by means of online mapping and online motion planning of trajectories that meet motion and probabilistic constraints. The framework, depicted in Figure 1, is threefold: (i) a mapping module that incrementally builds an uncertainty-aware map, (ii) a planning module that continuously computes a safe and feasible trajectory towards the goal, and (iii) a framework manager that coordinates the overall framework's execution.
The remainder of this section describes the manager's strategy to control the interaction between the two core modules of the framework, i.e. the mapping (see Section 5) and the planning (see Section 6). Note that although the presentation of the framework focuses on the online mapping and planning challenge, the proposed online scheduling intrinsically solves the offline motion planning problem.

Framework Pipeline
The framework manager coordinates the mapping (MAPPER) and planning (PLANNER) modules according to the pipeline presented in Algorithm 1. This is, given the desired goal region B goal and the required probabilistic safety guarantees p safe , the manager conducts an iterative process until the system reaches the predefined goal region (line 7). An iteration consists of solving an updated version of the underlying motion planning problem which accounts for any alteration to the system's state and environment awareness. Each iteration starts with the manager predicting a suitable planning frame R for the planning problem. The planning frame defines the state where the planner will start exploring the state space for a solution. As discussed in Section 4.2, the planning frame is determined according to the current plan in execution ongoing_traj (line 8). Then, the manager retrieves from the MAPPER the current environment awareness as a cumulative map F R X relative to R (line 9). Both the predicted planning frame R and the updated cumulative map F R X are provided to the PLANNER (line 10 and 11). Before proceeding to solve the updated planning problem, the current plan in execution ongoing_traj, if any, is probabilistically checked for collision according to the current uncertainty-aware map F R X . In the event of ongoing_traj not being any longer valid, the framework manager dispatches to the robot the segment ongoing_traj of ongoing_traj that is still safe (line 12 and 13). This approach prevents the vehicle from stopping every time that a trajectory gets partially invalidated while ensuring its safety.
Finally, the PLANNER attempts to solve the planning problem by growing a new tree in B for a specific amount of time ∆T M P (line 14). The PLANNER tries to find a near-optimal trajectory which meets kinematic and probabilistic constraints within the allocated time budget ∆T M P , and returns a new_traj if one is found (line 15). The newly found new_traj is uniquely dispatched to the robot when it fulfils the selection criteria defined in satisfiesCriteria() (line 16 to 18). This work bases the selection criteria satisfiesCriteria() on the length of the trajectory; new_traj is dispatched if length(new_traj) ≤ length(ongoing_traj), where length(ongoing_traj) = ∞ if ongoing_traj is partially invalidated, i.e. it does not reach the goal region B goal .
Note that the computations in line 8 and 9 are low demanding and they can be scheduled in parallel to the main execution of the framework's pipeline. Therefore, the overall iteration rate of the framework is 1/∆T M P , as solving the planning problem (line 14) is the unique process of the framework that requires a non-negligible amount of time.
Given the nature of the problem of navigation in unknown environments, it may be possible that a feasible and probabilistically safe trajectory towards the goal region does not exist. Therefore, the framework is endowed with a contingency plan that attempts to return the vehicle nearby the deployment location b start . This contingency plan gets activated when the planner has not been able to find a solution in the last n cp consecutive iterations, where n cp is a user-defined safety value. In the event of the contingency plan getting activated, the MANAGER is reinitialised with the new planning problem. Note that if the environment awareness is highly uncertain, there might not exist a trajectory towards the new goal region. In this situation, not considering the previous map information for planning would allow the vehicle moving safely towards the deployment location. In case no feasible motion plan is found to return to the deployment location, an emergency manoeuvre should be performed, e.g., coming to complete stop for ground vehicles, going to the water surface for AUVs, and immediate landing for UAVs.

Prediction of the Planning Frame
Optimising motion planning strategies, such as the one presented in this work, employ a time budget ∆T to find a trajectory which solves a predefined planning problem.
Consequently, there is, at least, a time lapse ∆T between the definition of the planning problem and the usage of the resulting trajectory. In applications where the robot state does not change during ∆T , it is reasonable to define the planning frame R ∼ x W R as the current system's state , which is represented with respect to the global frame W . However, in applications where the robot might be in movement in the course of ∆T , such as the online mapping and planning application targeted in this work, a suitable planning frame must be specified ahead of time. Ideally, the frame of the planning problem should be defined such that it corresponds to the robot state at the time the resulting trajectory will be utilised.
Bearing in mind that the presented framework computes a candidate new trajectory periodically every ∆T M P and that it has full knowledge of the plan in execution ongoing_traj, calculating a planning frame at time t (line 8) which is suitable at time t + ∆T M P (line 18) can be formulated as a state prediction problem. This is, given the current robot state x W R and the set of subsequent controls u involved in the execution of ongoing_traj, x W R is computed by integrating (2)-(3) for the time-horizon ∆T M P .
Predicting the frame of the planning problem ahead on time (i) guarantees the feasibility of reaching the initial state of any found solution from the current robot state, (ii) prevents sudden changes in the vehicle's direction of motion when transitioning from the ongoing_traj to the new_traj (line 16 to 18), and (iii) enables the planner to periodically set a planning problem that considers any update on the robot's localisation estimate.

Incrementally Mapping Unknown Environments via Local Maps
Incrementally exploring the environment with a locationuncertain system leads to an uncertain representation of the surroundings. Under these conditions, obtaining a consistent and reliable representation of the entire environment is a challenging task commonly addressed with probabilistic inference approaches. These algorithms rely on gathering data from which distinctive features (landmarks) can be extracted and used to bound the uncertainty of the environment representation and system localisation. Nonetheless, even for scenarios rich in features, there are always some residual uncertainties. Moreover, onboard perception sensors usually suffer from noises, which compromise the accuracy of the environment representation. All these issues motivate the need of an environment representation that jointly explains captures the uncertainty on the true obstacle's localisation and the detection confidence according to the sensor model, while being suitable for motion planning. In this work, such a representation is referred to as probabilistic map. This section details the undertaken mapping approach, which builds a set of local occupancy submaps whose base poses are uncertain with respect to a global frame (see Section 5.1). Each submap is an occupancy grid map, which provides an efficient strategy to encode the incremental environment awareness (see Section 5.2) and retrieve information about the environment occupancy (see Section 5.3 and Section 5.4). This overall mapping strategy has proven to be suitable for real-time robotic mapping and planning applications in our previous work (Pairet et al. 2018), and despite being out of the scope of this manuscript, has also shown to be effective for online mapping and localisation applications (Ho et al. 2018).

Global Map as a Set of Local Submaps
There are different alternatives to represent the incremental knowledge of an environment. The framework presented in this manuscript encodes the environment M via a set of n local stochastic submaps Tardós 2007, 2008) due to its demonstrated efficiency on dealing with applications requiring real-time robotic localisation, mapping and planning (Pairet et al. 2018;Ho et al. 2018). Formally, the local submaps method is defined as: A new local submap LM n+1 is initiated every ∆T LM such that the accumulated localisation error within the active local submap LM n is low. In other words, the local mapping time horizon ∆T LM must be defined such that it always maintains the robot pose uncertainty Σ LMn R within the active local map LM n negligible.
The coordinate system of a new local submap LM n+1 is defined at the robot state estimate when LM n+1 is initiated, i.e. x W LMn+1 = x W R . It is assumed that the robot starts building LM n+1 as soon as it finishes the LM n . Therefore, the robot state at the end of LM n (defined as the last global robot state when building LM n ) is the same as the global robot start state of LM n+1 . For simplicity, the origin of the global map W is chosen to be the same as the coordinate frame of the first local submap LM 1 , i.e. the robot's initial state. Figure 2 illustrates the concept of using local submaps to map the incremental knowledge about the environment. Particularly, the figure depicts a robot which has been navigating in an unknown environment, while in the meantime, it has been encoding the perceived surrounding environment in a total of eight local submaps. Noteworthy, the example assumes an open-loop navigation, i.e., without localisation updates. Therefore, the first defined submaps are less uncertain with respect to the global frame W than those built at a later stage. This fact corresponds to an unbounded growth of the uncertainty on the system localisation estimate.

Local Submap as Occupancy Grid Map
The assumption of null uncertainty on the robot pose within each local submap, also referred to as known robot poses, enables the representation of each local submap as an occupancy grid map. The chosen alternative to efficiently encode an occupancy grip map is via Octomaps (Hornung et al. 2013). Octomaps permits fusing range-based data into a probabilistic voxel representation, which generates an occupancy grid map with adjustable resolution. Octomaps store the information in an octree data structure, which provides fast access time while, at the same time, optimises the memory usage. All these desirable features make the undertaken mapping strategy ideal for online mapping and planning.
The probabilistic sensor fusion within an occupancy grid map is performed as an Octomap (Moravec and Elfes 1985;Hornung et al. 2013). This is, the probability P (v|h 1:k ) of a cell v to be occupied given a set of sensor measurements h 1:k is estimated as: where P (v|h k ) is the inverse sensor model characterising the sensor used for mapping and P (v|h 1:k−1 ) is the preceding estimate given all historical measurements. Using log-odds notation: and under the common assumption of a uniform (noninformative) prior, i.e., P (v) = 0.5, (11) is simplified to: L(v|h 1:k ) = L(v|h 1:k−1 ) + L(v|h k ).
To change the state of a node v, (13) requires as many observations as the ones used to define its current state. This overconfidence in the map is addressed as in (Yguel et al. 2008) by using a clamping policy to ensure that the confidence in the map remains bounded: where l min and l max denote lower and upper bound on logodds values. As a consequence, the model of the environment remains updatable (Hornung et al. 2013).
The measurement update rules in (13)-(14) can be used with any kind of distance sensor, as long as the inverse sensor model is available. Our framework employs the extended beam-based inverse sensor model depicted in Figure 3. This model assumes: (i) that the line of sight between the sensor origin and the endpoint of a measurement does not contain any obstacle (free space), (ii) that endpoints correspond to obstacle surfaces (occupied space), and (iii) that the line continuing beyond the endpoint until the maximum sensor range is likely to be occupied by the observed obstacle (occluded space). Then, the extended ray-casting operation to update each voxel v from the sensor origin to the maximum sensor range is performed using the following logodds inverse sensor model: where l f ree and l occ are constants determined according to the sensor model, and l ocl penalises occluded zones according to the decaying function: where for a decay rate γ ∈ (0, 1), l ocl decreases γ times for each unit of d, which is the distance from the measurement endpoint. This corresponds to l ocl = l occ for d = 0, i.e., in the hit point, and to l ocl → 0, i.e., to a non-informative P (v) = 0.5, as d → ∞. The maximum expand of the occluded region is as far as the sensor range.

Map Fusion and Single Point Query
An occupancy query to the current probabilistic map M is done by converting the given query into multiple local queries. The occupancy probability values at each local submap can be fused together by means of the logodds update rule in (13) with the corresponding clamping operation in (14). These operations apply because combining measurements from multiple local submaps is a similar operation as combining multiple measurement updates in a single global map (Ho et al. 2018). Without loss of generality, assume that an occupancy query at positionx Y is performed from an uncertain coordinate frame Y with known pose estimate This global query corresponds to the multiple local log-odds occupancy queries: where L 1:i−1 (x Y ) is the accumulative log-odd estimate from the precedent i − 1 local submaps with L 1:i−1 (x Y ) = 0 for i = 1, L i (·) implies that the log-odds lookup is done in the local submap LM i , and x LMi is calculated via the linear estimation of known spatial relationships: where ⊕ denotes the compounding operation and corresponds to its inverse relation, as commonly used to simplify notation when calculating spatial transformations (see Appendix B for a brief introduction and (Smith et al. 1990) for a full review). Given thatx Y in local coordinates follows a probabilistic distribution, the local occupancy query L i (x LMi ) is: where v represents the set of voxels in submap LM i and P i (x LMi ) can be described in log-odds L i (x LMi ) notation via the log-odds transform.

X
The previous section provides a strategy to query the occupancy probability P (x) of a single point coordinate x ∈ X . Our previous work (Pairet et al. 2018) demonstrated that this approach is suitable for the requirements of an online planner under probabilistic constraints. However, bearing in mind that each planning cycle requires numerous queries of P (x) involving different x, the overall planner performance can be enhanced by computing the map fusion before the planning time budget starts. The probabilistic map fusion over all state space X is described by the cumulative distribution F X over the local density distributions of the sensed environment ‡ . In particular for the online planning problem, it is of interest to fuse the map information with respect to the predicted planning frame R (see Section 4.2), such that the cumulative map F R X reflects the relative uncertainty between the current environment awareness and the planning frame R . Figure 4 illustrates the extraction of F R X from a set of local maps. Computing F R X implies that the computational requirements of retrieving P (x R ) during planning time are reduced to those of a look-up table in the cumulative map F R X . Subject to the log-odds transformation, F R X is computed by rewriting (17)-(19) as: where L 1:i−1 (X R ) is the accumulative log-odd estimate from the precedent i − 1 local submaps with L 1:i−1 (X R ) = 0 for i = 1, L i (·) implies that the log-odds lookup is done in the local occupancy submap LM i , and X LMi ∼ N (X LMi , Σ LMi R ) corresponds to the state spacê X R in local coordinates defined as: Then, the occupancy probability L i (X LMi ) at LM i for all x ∈ X LMi is computed as: where v represents the set of voxels in submap LM i , K α (·) with confidence level α = 1 is a kernel representing the discrete version of a Gaussian distribution over the entire span of the local submap LM i (see Appendix C). ⊗ is the correlation operator, i.e. a sliding inner product, and P i (X LMi ) can be described in log-odds L i (X LMi ) via the log-odds transform.
Interestingly, the underlying computation of F R X is the correlation operator ⊗, a common technique for which there exist efficient implementations. On top of that, the independence between local submaps allows parallelising the computation of (22) for each LM i in different threads. Ideally, this process could be scheduled such that F R X is ready before the planning time budget starts.

Multi-layered Motion Planning under Environment and Motion Uncertainty
The planning problem defined in Section 3.4 has three main requirements: (a) to consider the vehicle's motion constraints, (b) to validate probabilistic constraints in face of uncertainties, and (c) to meet online computation limitations. ‡ Only those voxels describing the known environment, i.e. free, occluded and occupied space, are considered in the computation of F X . Considering the unknown space with its P (v) = 0.5 in the computations would lead to a cumulative map with misleadingly over-estimating occupancy probabilities.

Constrained planner sample(·) extend(·) lift(·)
(a) Multi-layered planning scheme (b) Lead planner -RRT* (c) Constrained planner -SST Figure 5. Multi-layered motion planning framework: the lead planner shown in blue in (b) computes a geometric lead ξ (red path) to guide the search space X of the constrained planner shown in magenta in (c). The resulting trajectory ξ with its uncertainty (yellow funnel) satisfies kinodynamic and probabilistic safety constraints.
Our previous approach successfully addressed all these requirements formulating a single-layered sampling-based planning strategy in the belief space (Pairet et al. 2018). The planner in question (i) samples feasible states in the system's state space, and (ii) extends and validates the tree of motions in the belief space. This approach proved to be suitable for solving online motion planning problems in challenging realworld scenarios, but its applicability was limited to lowdimensional planning problems given the huge search space and the computational burden of all considered constraints. As discussed in Section 2, multi-layered planning strategies enable online planning in high-dimensional spaces. This motivates the use of such an idea to extend our framework's capabilities to suit the planning requirements of a larger group of robotic systems and environments. Principally, the extended planning strategy employs a multilayered planning scheme (see Section 6.1) to overcome the aforementioned scalability issues. Such a strategy allows deferring the computation of kinematic constraints (see Section 6.2) and probabilistic constraints (see Section 6.3) after identifying some subregions of the system's state space that potentially contain a solution to the planning problem.

Multi-layered Motion Planning
The capabilities of our previous planner (hereinafter referred to as the constrained planner) are extended to deal with problems of higher dimensionality by means of a multilayered planning strategy. As schematised in Figure 5, the proposed strategy adopts a sequential two layered planning scheme consisting of a lead planner and the constrained planner. The lead planner seeks to determine a subregion X ⊂ X of the entire state space that eases, and consequently speeds up, the search of the final trajectory ξ which accounts for all considered constraints (see Section 3.4). To this aim, the multi-layered scheme is designed as following: • Lead planner: employs the RRT* algorithm to rapidly find a path in the workspace W while optimising a desired cost. The obtained lead path is a nearly optimal geometric solution ξ ∈ W used to determine X via the lifting operator lif t : W → X detailed below. • Constrained planner: leverages the delimited search space X and the SST algorithm in (Pairet et al. 2018) to rapidly find the final solution ξ which meets kinodynamic constraints (see Section 6.2) and probabilistic safety constraints (see Section 6.3).
Although the planners within the multi-layered planning scheme could be different, the selection above suits the online requirements of our framework. This is, the framework's overall planning time ∆T M P is divided as ∆T M P = ∆T L + ∆T C , where ∆T L and ∆T C are the time budgets allocated to the lead and constrained planners, respectively. Then, given our selection of planners, the assignment of time budgets allows ∆T L ∆T C as (i) the lead planner is adept at providing quickly a suitable lead path, such that (ii) the constrained planner has at its disposal most of the time budget ∆T L ≈ ∆T M P to refine the final trajectory which accounts for all the considered constraints.
Given the aforementioned selection of planners and their operational space, the designed multi-layered planning scheme requires the lifting lif t : W → X . A common lif t(·) strategy is to define X as a tube around ξ with radius d for the geometric components of the state space, whereas the non-geometric components are left unbounded . The performance of this approach, however, is susceptible to the parametrisation of d; tight search spaces, i.e. small radius d, promote final solutions with lower cost than those obtained with bigger radius d. On top of that, relying on a fixed d requires hand-tuning such parameter to ensure that the final solution lies within X ; if X does not contain the final solution, the planner will lack probabilistic completeness. Adjusting d to ensure probabilistic completeness would prove to be a cumbersome task since the type of environment and planning constraints, among many other factors, should be taken into account.
Differently from other multi-layered planning schemes in the literature, ours uses a method of information interchange between planners that maintains the completeness and asymptotic optimality properties of the constrained planner when used in a standalone fashion (Pairet et al. 2018). This work builds on the idea of sampling around a lead path to present alternative definitions of X via the lif t(·) operator. In particular, the designed multi-layered planner exploits a mixture of samplers to trade-off the low-cost trajectories found when sampling around a lead path and the probabilistic completeness of uniform sampling. This manuscript proposes two mixture of sampling techniques: • Bias to rigid X : given a fixed radius d, the planner samples uniformly in X with probability p and uniformly over the space with probability 1 − p. • Adaptive X : the planner adjusts d within the range of a strictly guided sampling to a uniform search. Adjusting d can be conducted via some heuristics or as an optimisation problem subject to a cost function.
The performance of both approaches in comparison to a rigid X strategy is discussed in Section 7.2. Noteworthy, any of the two presented mixture of sampling strategies ensures probabilistic completeness of the overall multilayered scheme. As an extreme example, let us consider the scenario depicted in Figure 6c-6d, where the lead planner finds an asymptotically optimal solution through the farthest (most left) corridor. However, according to the probabilistic safety constraints defined in Section 6.3, such a corridor does not offer any safe passage. Despite the initial bias towards this unsuitable X , a mixture of sampling strategies, as the ones introduced in this section, permits finding a solution if one exists provided enough time, thus ensuring probabilistic completeness guarantees.

Planning Under Motion Constraints
The system's motion capabilities are considered in the constrained planner by expanding a tree with the system's motion model (2)-(3). In particular, the constrained planner employs the SST algorithm (Li et al. 2016) to build a tree in the belief space with state beliefs x ∼ b = N (x, Σ x ) as nodes. The tree expansion is based on two procedures: sample(·) and extend(·), which are conducted in the state and belief space, respectively (see Figure 5a). That is, sample(·) draws a random state x rand ∈ X , where X is a subregion of X as defined in Section 6.1. The planner then selects a node from the tree to attempt connecting to the randomly sampled state x rand . Such a selection is conducted via nearest neighbour in the state space using Euclidean metric. The selected node x near has a probabilistic representation in the belief space, i.e. x near is better described as x near ∼ b near = N (x near , Σ xnear ). Then, from this belief, the extend(·) procedure expands the tree in the belief space by evolving the system's motion model (2)-(3) with a randomly sampled control input u ∈ U. This expansion is done for a random period of time T prop . Since the considered motion model includes the system's uncertainty, each obtained belief (tree node) corresponds to a vehicle's state with its associated uncertainty (see Figure 7).

Planning Under Probabilistic Constraints
Given a minimum safety probability bound p safe , the imposed probabilistic constraint requires 1 − p collision (b, M) ≥ p safe for every belief b on the trajectory in order to probabilistically guarantee the robot's safety. In our approach, the environment awareness and the relative uncertainties are represented by the cumulative distribution F R X , which jointly encodes the density distribution of the perceived environment on a discrete support (see Section 5). As discussed previously, this representation of the environment favours efficiency for online mapping and planning applications. In fact, such encoding allows to guarantee 1 − p collision (b, M) ≥ p safe for each belief b of the tree as: where α is the confidence level on the computation of p collision,α (·) ∈ [0, α]. In other words, p collision,α (·) does not cover a (1 − α) span of the belief b over the state space. Therefore, it is assumed that the remaining (1 − α) is in collision to ensure probabilistic guarantees on the collision checking decision. All in all, this method can be exploited to trade a constant conservatism α in favour of performance. The probability of collision of a robot centred belief b R ∼ N (b R , Σ R b ) with the environment is: where ·, · F is the Frobenius inner product of the overlapping region between theb R -centred discrete state belief K α Σ R b k (see Appendix C) and the cumulative environment awareness F R X . The Frobenius inner product is an efficient operation via matrix vectorisation.
The overall proposed multi-layered planner leads to the exploration tree depicted in Figure 7, whose edges account for the vehicle's kinodynamic capabilities and whose nodes are probabilistically safe subject to the system's localisation, motion and environment uncertainties. Additionally, the expansion of the tree is also subject to states not leading to an inevitable collision, i.e. a state must allow for the vehicle to make a full stop before colliding.

Experimental Evaluation
The proposed framework has been implemented in robot operating system (ROS) and uses the facilities provided by the Octomap (Hornung et al. 2013) as the core building block of the adopted mapping strategy, and the OMPL (Şucan et al. 2012) to ease the motion planning requirements. This implementation has been used to evaluate thoroughly the different proposed features and the framework as a whole. This section presents the results of such analysis in an incremental fashion. First, the capabilities of the precedent version of the framework in simulated and real-world scenarios are analysed and discussed in Section 7.1. Then, Section 7.2 and Section 7.3 report the performance of the key components of the newly proposed framework, i.e. the multilayered scheme and the probabilistic collision checking. The potential of these components is individually evaluated against closely related state-of-the-art approaches. Finally, the capabilities of the new framework are demonstrated in Section 7.4 in different environments.

Start-to-goal Queries in Undiscovered 2D Environments
The experimentation reported next has been conducted using the precedent version of the online mapping-planning framework presented in this manuscript, which namely was limited to 2D environments and had a single-layered planning strategy (Pairet et al. 2018). Such a framework has been deployed on the Sparus II autonomous underwater vehicle (AUV) (see Figure 8), a nonholonomic torpedoshaped vehicle with hovering capabilities rated for depths up to 200m . The AUV is limited to operate at a constant depth, i.e. in SE(2), to meet the limitations of the precedent framework. Under these conditions, the motion model of the Sparus II can be approximated by a unicycle system as detailed in Appendix A.1. The AUV is equipped with a mechanical scanned imaging sonar (MSIS) to perceive the surroundings. The readings of such sensor are used to build a representation of the environment. We use the default parameters in (Hornung et al. 2013) of l min = −2 and l max = 3.5 corresponding to the occupancy probabilities P (v) = 0.12 and P (v) = 0.97, respectively, and l f ree = −0.4 and l occ = 0.85 which corresponds to P (v) = 0.4 and P (v) = 0.7, respectively. The decay rate in (16) is set to γ = 0.8. The framework's planning time is set to ∆T M P = 1.5s.
The test-bed to evaluate the precedent framework consists of two environments located in Sant Feliu de Guíxols (Spain): (a) breakwater structure that is composed of a series of concrete blocks (14.5m long by 12m width), which are separated by four-metre gaps (Figure 9a and Figure 9b), and (b) rocky formations that create an underwater canyon of 28m long (Figure 9c and Figure 9d). Using these environments, two experiments are reported: (i) evaluation of the overall performance of the framework in the underwater simulator (UWSim) (Prats et al. 2012) and (ii) validation of the framework in real in-water trials. Experiment (i) is conducted in both environments, while experiment (ii) is uniquely tested in the real breakwater structure scenario.
7.1.1 Simulated trials: before conducting in-water experiments, the framework was exhaustively tested in the simulated breakwater structure and canyon scenarios. In the former environment, 19 start-to-goal queries out of 20 attempts were successfully solved. Among those 19 successful experiments, the robot achieved the goal region B goal by crossing through the first four-metre gap in 17 occasions, while in the remaining two trials, the planner found a less optimal trajectory through the second four-metre passage. Figure 10 depicts the mission execution in one of those trials. In the initial part of the mission, the environment is completely undiscovered, finding a solution trajectory that goes straight to the goal ( Figure 10a). As soon as the trajectory gets invalidated (Figure 10b), a new collision-free trajectory is  computed ( Figure 10c). After some mapping-planning iterations, the robot gets out of the four-metre gap between two blocks (Figure 10d). In average, the calculated trajectory towards the goal has a length of approximately 45.2m and is completed within 2 21 . The start-to-goal query in the simulated canyon scenario has been successfully solved 20 times out of the performed 20 trials. The higher success rate with respect to the previous experiment is given by the nature of the environment; this scenario involves less abrupt manoeuvres and the passage is wider, more than twice larger though. Figure 11 depicts the trajectory calculated in one of those successful trials through the narrow passage in the middle of the canyon. In average, the calculated trajectories towards the goal have a length of approximately 58.4m and are completed within 2 59 .
After those 40 trials conducted in two different scenarios, the framework has demonstrated a satisfactory performance to proceed with the deployment in real-world.  7.1.2 Real-world trials: after the framework had demonstrated a satisfactory performance and high success rate in the simulated trials, it was deployed on the Sparus II AUV to prove its suitability for real-world robots with limited onboard computation power. The in-water experiments were conducted in the real breakwater structure located in Sant Feliu de Guíxols (Spain). The robot was required to solve a start-to-goal-query to reach a goal region B goal located on the opposite side of the structure, which can only be achieved by navigating through any of the narrow four-metre gaps. During those autonomous missions, the vehicle is connected to a wireless access point buoy for monitoring purposes. Prior experiments before running the entire framework revealed that the robot's behaviour when dealing with real conditions diverged from simulations in (i) the perception quality, being significantly noisier and (ii) the trajectory tracking performance, being slightly degraded because of the waves and currents. Noise on the observations was reduced by experimentally adjusting the range of the mechanical scanned imaging sonar (MSIS) at 10 metres.
Despite these challenging conditions, the framework successfully accomplished finding and driving the Sparus II AUV towards the desired goal region B goal through one of the narrow gaps in the breakwater structure § . The trajectory was found through the first corridor four out of five times, while in the other trial the robot went through the second gap. Figure 12 depicts Sparus II in one of those in-water trials and the trajectory calculated towards the goal, which has a length of 57.9m and took 3 07 .

Multi-layered Planning Scheme
The multi-layered planning scheme presented in Section 6.1 is one of the key features allowing to overcome the scalability issues of our previous single-layered planner (Pairet et al. 2018). Nonetheless, differently from current multi-layered approaches which rely on rigid definitions of the candidate search space X (rigid-X ), this manuscript explores two alternative definitions of X (biased-X and adaptive-X ) based on a mixture of sampling experts. This section reports § A complete sea-trial through the real breakwater structure can be seen in: https://youtu.be/dTejsNqNC00. the performance of these four strategies in the scenario depicted in Figure 13, where the planning problem defined in the belief space consists in reaching the state between the blocks while satisfying kinodynamic and probabilistic safety constraints subject to a p safe = 0.99 minimum safety probability bound. In this evaluation, the entire environment is considered to be known in advance and the system dynamics are approximated as described in Appendix A.2.
The four methods (single-layered planner (SLP) and multi-layered planner (MLP) with rigid-X , biased-X and adaptive-X ) are evaluated for their ability to quickly find a solution and for the cost of the resulting trajectory. The given total planning time budget is set at ∆T M P = 1.5s to emulate online planning requirements, which is distributed as ∆T L = 0.3s and ∆T C = 1.2s for the three multi-layered schemes. With this setup, each planner attempts to solve the defined planning problem for a total of 2,000 times. Figure 14 depicts the number of successfully solved trials and the resulting trajectory length when considering a rigid-X lead with radius parameterisations d ∈ [0 40]m. While d = 0m strictly limits the search space to those states forming the lead path, d = 40m spans the search over all state space of the defined planning problem, therefore resembling uniform sampling. As it can be observed, small search spaces (small d) endanger the planner's ability to find a solution with limited time. However, when a trajectory is found, the resulting cost is lower than those solutions found with wider X leads. Instead, these wide search spaces (big d) make the planner struggle at solving most of the planning problems due to the search space extent. In between these two extremes, a suitable parameterisation with d = 12m (dashed lines) enables solving most of the trials to the planning problem while providing a trajectory with low length cost. Nevertheless, there are not efficient means of defining the optimal d in advance since it is dependant on the planning problem and environment characteristics. Therefore, a rigid-X strategy is not suitable for applications which lack of a fully prior informative representation of the environment. Moreover, too restrictive guided searches can endanger the completeness guarantees of the planner.
The performance of those approaches which guarantee completeness, i.e. the single-layered planner (SLP) (green) and multi-layered planner (MLP) with biased-X (magenta) and adaptive-X (orange) strategies, is depicted in Figure 15. In particular, biased-X is parametrised with radius d = 12m (best lead definition according to experimentation in Figure 14) and analysed for different p ∈ [0 1], whereas  adaptive-X is defined as shown in Figure 16, i.e. with an initial radius d = 3m which increases at a rate of 20m/s. This naive implementation of adaptive-X adjusts d from a strictly guided sampling to a uniform search such that as t → ∞, d → ∞, i.e. X → X (uniform sampling).
As it can be observed in Figure 15, our precedent singlelayered planning scheme struggles at finding a solution on most of the trials. This is because sampling uniformly the entire high-dimensional belief space requires more time to find a solution than the affordable time budget in online applications. Slightly worse performance is obtained when Figure 15. Performance of (i) our precedent single-layered planning (SLP) scheme (green) and the newly proposed multilayered scheme when considering (ii) a fixed lead with different bias p (magenta, with best radius as found in Figure 14), or (iii) an adaptive lead as defined in Figure 16 (orange). After computing a lead path, the constrained planner leverages an adaptive X strategy to initially promote solutions with low cost (small d) before ensuring probabilistic completeness by sampling the entire space (d → ∞). Once a solution is found, X is fixed to let the constrained planner refine the found solution until the completion of the planning time ∆TMP .
using a multi-layered scheme with biased-X and p = 0 because it still uses uniform sampling but with a portion of the total planning time budget. However, as p → 1, i.e. the planner is more guided to the lead X (whose optimal radius has been determined empirically in Figure 14), the performance of the planner increases, in both number of solved trials and length of the final solution. Interestingly, the proposed adaptive sampling method endows the framework with a competitive success rate and solution length than when hand-defining the optimal radius.

Comparison of Probabilistic Collision Checking Methods
Sampling-based planners must be able to analyse the validity of a certain state accurately and efficiently. While accuracy is relevant to avoid discarding regions of the state space which in fact are collision-free, efficiency allows for more space exploration given a limited time budget. However, accurate calculations jeopardise the ability to validate a state rapidly, specially when accounting for uncertainty. In this regard, chance constraints formulations (Blackmore et al. 2011;Luders et al. 2013) offer an interesting accuracyefficiency trade-off which has proven to be suitable for many motion planning problems in the last decade (see Section 2). In fact, chance constraints formulations are still the most widely used probabilistic collision checking method among those state-of-the-art motion planning applications which account for uncertainty (e.g. (Strawser and Williams 2018;da Silva Arantes et al. 2019)). This motivates the use of chance constraints as the baseline reference to assess the proposed probabilistic collision checking algorithm. The performance analysis comprises two chance constraints formulations (Blackmore et al. 2011;Luders et al. 2013) and our method with four different parametrisations α = {0.90, 0.95, 0.99, 0.999}. Each method is assessed by its accuracy and efficiency. A method's accuracy is computed as its ability to correctly detect that a state is valid: as invalid. T P + F N is the total number of valid states according to the standard of truth. Therefore, the higher the value of the metric in (26), the more accurate the method is. For the method's efficiency, the analysis considers the average computation time to process the state validity . These two metrics are analysed subject to three variables relevant to motion planning problems under uncertainty: (i) number of obstacles n o in the environment, (ii) state uncertainty Σ x , and (iii) minimum safety probability bound p safe . With this setup, a single case study is parametrised by the triplet n o , σ x , p safe . In total, 847 case studies are retrieved according to the parametrisation span (minimum and maximum values) and discretisation defined in Table 2. Each case study is set up as follows. An environment M is defined in R 3 with a total of n o cubical obstacles. In order to have a computational representation of the scene suitable for each method, the environment is encoded as: (i) a set of linear constraints, where each cubical obstacle is characterised by six constraints, and (ii) a global occupancy grid map with 0.5m resolution. Then, given the known environment M, each probabilistic collision checking method is required to validate, subject to p safe , 10,000 beliefs b ∼ N (x, Σ x ). The state estimatex ∈ R 3 is uniformly sampled over X and the covariance matrix Σ x ∈ R 3×3 is assumed to be diagonal, i.e. Σ x = σ 2 x I 3×3 . The data extracted from the 847 case studies is depicted in Figure 17. In the interest of clarity, the corresponding discussion is divided into three parts: accuracy, efficiency and suitability.
Accuracy discussion: the accuracy analysis (first row in Figure 17) depicts that the number of obstacles in the environment is the variable penalising the methods' accuracy the most. This behaviour is due to the methods' conservatism, whose relevance increases with the hardness of the motion planning problem. In other words, the more conservative a method is, the more negatively affected it is. On top of that, the conservatism of chance constraints formulations (Blackmore et al. 2011;Luders et al. 2013) increases with the number of obstacles, whereas our approach accounts for a constant conservatism α. This tighter bound allows our method to outperform both chance constraints formulations, even when choosing the most conservative parametrisation α = 0.9. Higher values of α favour accuracy at the cost of more computational expenses (see discussion below). Importantly, the confidence level α of our method should always be set such that α ≥ p safe , otherwise the constraint in (24) will never be satisfied since ¶ The standard of truth is approximated by numerical integration of (6).
All experiments are performed with an Intel Core i7-7820X CPU @3.60GHz × 16 with optimised C++ implementation for all methods. the analysed part of the space is not sufficient to ensure probabilistic safety. This fact is visible in Figure 17c, where for α < p safe our method with parametrisation α is not used.
Efficiency discussion: the efficiency analysis (second row in Figure 17) reflects the expected computational complexity according to the theoretical grounds of each algorithm. This is, chance constraints strategies are fast for scenarios with few number of constraints, but their computational expenses grow linearly as the number of constraints increases. This linear correlation is influenced by the iterative nature of chance constraints, which allows to invalidate a state as soon as 1 − p collision (b, M) < p safe , i.e. without need to check all constraints. In other words, invalid states involve less time than those which are valid. Consequently, harder planning problems, i.e. those involving more obstacles, higher uncertainties or more restrictive safety guarantees, show a mild deviation towards lower computational time due to the presence of high number of invalid states. In contrast, the computational requirements of our method are uniquely influenced by the state uncertainty Σ z , which determines the number of voxels to include in the calculations (see Section 6.3). This might restrict the suitability of our approach to systems whose state uncertainty is bounded over time (see discussion below).
Suitability discussion: robotic systems operating in uncrowded environments, i.e. very few obstacles sparsely distributed in the space, might find chance constraints to be a suitable alternative. However, the accuracy and efficiency of such approaches scales poorly as the complexity of the motion planning problem increases, i.e. more crowded environments or higher uncertainties. As it can be observed in Figure 18a-18b, this behaviour endangers the ability of a planner to find a trajectory through tight apertures or narrow passages, even if one exists. If an alternative route towards the goal exists, the resulting solution will be larger than those trajectories found with less conservative approaches. Moreover, chance constraints require the representation of the environment to be a set of linear constraints, which can be prohibitively expensive to compute online, specially in applications where the environment is incrementally discovered.
In contrast, our approach trades a constant conservatism α in favour of accuracy and performance. This allows to deal with crowded environments efficiently while providing higher accuracy than chance constraints methods. Therefore, as depicted in Figure 18c, our probabilistic collision checking method enables a planner to find a solution through the tight corridors where chance constraints methods are over conservatist. However, our method involves higher computation times for highly uncertain states. This limitation might be relevant for systems with unbounded uncertainty, but most robotic systems are endowed with state estimation algorithms which keep the state uncertainty bounded over time. Alternatively, the parameter α can be adjusted to reduce the computation time while still guaranteeing safeness.
On the whole, the presented probabilistic collision checking approach proves to be a suitable strategy for a wide range of motion planning problems under uncertainty, even for those where chance constraints struggle at finding a solution. Moreover, our method is suitable for applications building a representation of the environment online, given that those usually exploit the efficient encoding of occupancy grid maps.

Start-to-goal Queries in Undiscovered 3D Environments
The proposed framework as a whole has been deployed on a simulated quadrotor unmanned aerial vehicle (UAV) (Meyer et al. 2012) equipped with a 3D Light Detection and Ranging (LIDAR). The considered environment is the Urban Stairwell scenario of the DARPA Subterranean Challenge 2019. This scenario is challenging due to its extensive workspace of 40.50 × 50.04 × 13.69 metres and all narrow passages that must be traversed to accomplish the requested start-to-goal motion planning query. Figure 19 illustrates the Urban Stairwell scenario altogether with the defined start-to-goal query. For this experiments, the quadrotor's dynamics are approximated to those of a fixed-wing plane as described in Appendix A.2 and the surroundings are mapped online from the sensor's data at a resolution of 0.2 metres. The remaining parameters of the mapping module are as those in the experiments reported in Section 7.1. During the mission, no localisation updates are considered to test the planner in the most adversarial conditions, i.e., large environmental and localisation uncertainties. The required probabilistic safety guarantees are p safe = 0.95 and the planning time is ∆T M P = 1.5 seconds, distributed as ∆T L = 0.3 seconds and ∆T C = 1.2 seconds. Figure 20 depicts some snapshots * * of the online mapping and planning procedure in the Urban Stairwell scenario of the DARPA Subterranean Challenge 2019. The proposed framework allows for probabilistically safe autonomous navigation in such a hostile and unknown environment. The mesh of the Urban Stairwell scenario is composed of a total of 108,512 faces. Although these faces could be potentially approximated online from the sensor's data and used as linear constraints in (Blackmore et al. 2011;Luders et al. 2013), it would imply checking for collision against 30 times more linear constraints than those considered in Section 7.3, for which chance constraints methods already showed poor performance due to their over conservatism. Instead, our framework is able to efficiently deal with these complex scenarios, as demonstrated with the experiments conducted in the Urban Stairwell scenario of the DARPA Subterranean Challenge 2019. * * A complete trial through the Urban Stairwell scenario of the DARPA Subterranean Challenge 2019 can be seen in: https://youtu.be/KV2QFglamZs.  (yellow). (c) When the previous trajectory is partially invalidated due to the incremental knowledge of the surroundings, the framework finds a new trajectory towards the goal. Note that the previously observed patches of the environment become more uncertain (greyish areas) as the robot moves.
(e-f) The entrance to the narrow stairwell is fully mapped and the framework successfully plans through it despite the considerable accumulated uncertainty. (g-i) As the robot moves into the stairwell, the framework continues iterating over the mapping-planning process to ensure save navigation until (j) reaching the goal region. (k) Incremental set of local maps composing the discovered environment during the mission, and (l) corresponding cumulative map F R X (only showing those voxels P (v) > 0.4 for visualisation purposes).

Conclusion
This paper has presented a novel end-to-end framework, which probabilistically guarantees the robot's safety when navigating in unexplored environments. The proposed approach is twofold: (i) incrementally maps the vehicle's surroundings to build an uncertain representation of the environment, and (ii) plans feasible trajectories (according to the robot's kinodynamic constraints) with probabilistic safety guarantees (according to the uncertainties in the vehicle's localisation, motion and mapping). Our proposed approach includes a multi-layered planning strategy which enables for faster exploration of the high-dimensional belief space, while preserving asymptotically optimal and completeness guarantees, and an efficient evaluation and tighter bound on the computation of the probability of collision than other uncertainty-aware planners in the literature. Overall, the framework is capable to deal with high-dimensional problems online while being suitable for systems with limited on-board computation power.
Experimentation conducted in simulation shows some of the theoretical qualities of this work. Additionally, simulated and real-world trials on an AUV and a quadrotor UAV demonstrated the suitability of the framework to guarantee the robot's safety while navigating in unexplored environments and dealing with real-robots constraints.
The framework is not restricted to the presented experimental evaluation nor a specific platform. Any other mobile robot, either terrestrial, maritime or aerial system can benefit from this work. The modularity of the proposed framework allows for multiple extensions and variations. Foremost, although the experimental evaluation of the proposed framework has been conducted considering the worst-case scenario of an open-loop navigation without uncertainty update, the framework can bear with periodic navigation updates as described in Section 4.2. An interesting possible feature that could be added to the framework is the use of the truncation trick, i.e. to uniquely propagate the posterior of the estimation which is in no collision. However, truncating the system's belief involves approximating the posterior to a Gaussian distribution. Another possible extension is leveraging the multi-resolution encoding of Octomaps to check the compliance of the safety guarantee at different resolutions. Formulating this process as a multi-resolution kernel checking could speed up computations even further. Finally, the conducted experimentation pointed out that automatically adjusting the replanning period might be beneficial, as well as studying more intelligent methods to leverage from the lead path or even prior solutions. more complex spatial relationships. The interested reader may wish to consult (Smith et al. 1990) for a more thorough explanation about Gaussian relationships than the brief introduction that follows.
where the kernel size m d is always odd, σ d is the standard deviation of Σ x along dimension d, and the critical value t α is computed from the desired confidence level α ∈ [0, 1] as: where φ −1 (·) denotes the quantile function of the d-dimensional Gaussian normal distribution describing the system's belief b. Table 3 shows some critical values t α according to commonly desirable confidence levels α for 1D, 2D and 3D Gaussian distributions. Noteworthy, the confidence level α involves a trade-off between computational performance and accuracy. On one hand, α bounds the extend of the resulting K α (·) over the belief space, thus determining the total number of voxels in the kernel and, consequently, having an impact on the computational load of the probabilistic collision checking formulated in (24). On the other hand, (24) introduces a constant conservatism α, implying that α must be selected such that α > p safe . Otherwise, the method will not find any valid state. This requirement is implicit in the probabilistic collision checking formulated in (24).
The value of each cell n ∈ K α (Σ x ) can be drawn from the corresponding Gaussian distribution as h D N (x |x, Σ x ), where h D is a normalising constant according to the kernel resolution h, and x is the point coordinate of n referenced atx. For the particular case of a multivariate Gaussian N (x, Σ x ) with diagonal covariance matrix Σ x , i.e., its elements can be written as Σ ij = σ i 2 I ij , where I ij are the matrix elements of the identity matrix (so I ij = 0 if i = j and I ij = 1). Then, the multivariate Gaussian with diagonal Σ ij = σ i 2 I ij factorises into a product of univariate Gaussians as: where for any arbitrary positive definite covariance matrix Σ x the resulting distribution is normalised. The property in (41) provides a computationally efficient strategy to build any d-dimensional kernel K(·) from 1D Gaussian signals. It is worth mentioning that the kernel computation can be conducted and stored offline for different kernel sizes. At running time, the planner would uniquely need to retrieve in a look-up table fashion the required kernel. Although this is an option to speed up performance of the presented probabilistic collision checking, the implementation in this work computes the kernels online.