A Bi-Level Probabilistic Path Planning Algorithm for Multiple Robots with Motion Uncertainty

For improving the system efficiency when there are motion uncertainties among robots in the warehouse environment, this paper proposes a bi-level probabilistic path planning algorithm. In the proposed algorithm, the map is partitioned into multiple interconnected districts and the architecture of proposed algorithm is composed of topology level and route level generating from above map: in the topology level, the order of passing districts is planned combined with the district crowdedness to achieve the district equilibrium and reduce the influence of robots under motion uncertainty. And in the route level, a MDP method combined with probability of motion uncertainty is proposed to plan path for all robots in each district separately. At the same time, the number of steps for each planning is dependent on the probability to decrease the number of planning. (e conflict avoidance is proved, and optimization is discussed for the proposed algorithm. Simulation results show that the proposed algorithm achieves improved system efficiency and also has acceptable real-time performance.


Introduction
In the running of multiple robots in warehouses, motion uncertainties happen unavoidably since there exist some disabled components of robots, the instability of networks, and the interference of people walking. Under this case, robots would not follow the designed paths and the coupled relationship between temporal and space domain for paths is broken. And there is no doubt that other robots are disturbed by the ones where motion uncertainties happen. Finally, the whole multirobot system would be influenced and this brings about chaos or even breakdown of the whole system. It can be seen that this poses a great challenge to the path planning of multiple robots in warehouses. erefore, taking motion uncertainty into consideration in the path planning of multiple robots is an issue worthy of attention and research.
As a kind of method taking motion uncertainty into consideration for the discretized map, Ma et al. [1] designed the path planning algorithm which contains relevant restricted principles that an agent cannot enter the vertex that is occupied by some other agents at the last time step. e conflicts can still occur and this would cause the breakdown of the multirobot system in warehouses. Another kind of method which can deal with the issues caused by motion uncertainties needs to replan the paths of multiple robots when the samplings of robots contain abnormal information. Li et al. [2] proposed a two-stage trajectory planning scheme where a directed graph with cost variable connection is established and cost variable depends on the reachability, the risk of collision, and different state constraints. Gombolay et al. [3] presented a centralized algorithm that handles tightly intercoupled temporal and spatial constraints and scales to environment with large number of robots, which aims at improving the efficiency firstly. However, the above method cannot guarantee the real-time performance. In [4], the authors proposed a hybrid architecture which combines centralized coordination with distributed freedom of action to achieve an appropriate interplay. e architecture takes the new sampling for the state of robots at each time step, and path planning algorithm would be adopted to coordinate paths of multiple robots when sampling information containing motion uncertainties . However, the real-time performance cannot be guaranteed and the efficiency of the whole system is reduced due to too frequent planning. e algorithm introduced in [5,6] based on generalized probabilistic roadmaps (GPRM) also coordinates paths of multiple robots under motion uncertainty by using the feedback information. In the algorithm, the passive planning strategy is introduced in the multiple traveling salesman problem (MTSP) solution methodology. However, the system efficiency is low and the scalability of the algorithm cannot be guaranteed because far more points need to be sampled. A novel method in [7] based on the sampling considering the motion uncertainty would propagate this delay. Robots' paths need to be replanned adopting the Pareto-optimal plan repair scheme. However, the problem concerned with the decreased system efficiency is still not solved.
As a classical method devoted to solving the uncertainty problem, the Markov decision process (MDP) is introduced to the field of multiple agents. Corresponding multiagent MDPs (MMDPs) have been designed for multiple agent planning. In [8], approximating methods are introduced to be combined with MMDPs for the path planning problem of a team of homogeneous microair vehicles (MAVs) towards a set of goals. is algorithm can guarantee that there are no conflicts. Ma et al. [9] proposed the use of robust plan-execution policies to control how each robot proceeds along its path with a 2-level MDP solver that generates valid plans. To optimize the solutions, a novel optimal solver is designed in [10] for the problem of motion uncertainties in the transitionindependent MMDPs. In [11], an equilibrium policy has been proposed in MMDPs. Based on the notion of macroactions, the path planning problem is transferred into the decentralized partially observable Markov decision processes (Dec-POMDPs) in [12,13]. Multiagent reinforcement learning (MARL) [14][15][16][17][18][19][20] solves the learning task in multiagent systems (MASs). Nevertheless, the reasoning agent would deal with learning tasks which grow exponentially according to the number of agents. erefore, the computational complexity of these methods based on MDPs is so high that the real-time scalability performance cannot be guaranteed.
Recently, the planning methods based on homotopy classes have been proposed to solve the problem of motion uncertainties. In [21,22], the coordination space is established, which is the base for the later path planning. By selecting homotopy classes of the paths which are collisionfree, a centralized controller is designed for the path planning problem under motion uncertainties. Furthermore, the encoding based on priorities, which is called the priority graphs, is introduced for the homotopic solutions to solve the coordination problem in [23]. For [21][22][23], the planning method based on homotrophy classes can be proved that there are no conflicts for the coordinating paths and make sure that all robots will eventually reach their destinations even when some robots are temporarily stopped by a delay disturbance. However, the quality of solutions is decreased to a great extent since this kind of method only considers the next step of motions and all the robots would follow the previous planned paths.
Although the above methods ensure that conflict can be avoided among the planning of multiple agents under motion uncertainty, problem still exists. e efficiency of a multirobot system with motion uncertainty cannot be guaranteed. erefore, the first objective of this paper is to design an algorithm which can plan conflict-free paths for multiple robots even when motion uncertainty happens. And another objective is to improve the system efficiency that includes the makespan of the whole system and the number of planning.
To attain these goals, in this paper, the map is partitioned into multiple districts and we propose a bi-level planning architecture that is composed of topology level and route level generating from above map.
In the topology level, the passing orders of districts are planned based on the weight of paths in topology map. And in this process, the weight changes dynamically according to the distribution of robots; the aim of planning in the topology level is to achieve the district equilibrium, which can in turn reduce the influence of robots under motion uncertainty and promote the system efficiency. In the route level, a MDP-based probabilistic method is proposed to plan path for all robots in each district separately. e MDP is combined with the probability of motion uncertainty to improve efficiency. And the number of steps for each planning is dependent on the probability to decrease the number of planning. We proved that the designed algorithm is conflict-free. And we also validated the system efficiency improvement of designed algorithm from the optimization discussion and simulation. e remainder of this paper is organized as follows. Section 2 provides the problem statement. In Section 3, the path planning algorithm for multiple robots is introduced. Section 4 proves that the proposed algorithm is conflict-free and discusses the optimization. Simulation validation results are reported in Section 5. Section 6 concludes our research.

Problem Formulation
e problem that this paper attempts to solve is the path planning of multiple AGVs with motion uncertainty in warehouses. In this problem, warehouse environment is discretized and simplified as a graph G(V, E) whose vertices V correspond to locations and whose edges E correspond to transitions between locations. Figure 1 shows the discretized map.
Let R � r (1) , r (2) , . . . r (n) be a set of robots. Time is discretized into time steps. To formally describe a plan, the path for a robot r (i) is a map p (i) which means that the location p (i) (t) of r (i) in each time step tϵN * belongs to the vertices V. A robot can either remain stationary or move to an adjacent vertex. Robots are allowed to move simultaneously but restricted to move on the same edge in opposite directions or move at the same vertex simultaneously to ensure that there are no conflicts among multiple robots. Figure 2 gives the illustrations of these restricted conditions.

Complexity
Let T � t (1) , . . . t (v) be a set of tasks which are generated continually. For each robot that has been assigned a task, it has a start position (pickup point) and a goal position (storage point). And the makespan of all the tasks is M. During the running of robots, motion uncertainty, which means that robots have to stop immediately, could happen at any time. And the probability of the uncertainty is p u . is paper assumes the following about the scheduling of multiple AGVs in warehouses: (i) All feasible routes are two-way bidirectional streets, which means that two robots can drive side by side and robots can drive in both directions on each feasible way. (ii) e warehouse is made up of shelves and feasible ways, which is shown in Figure 1 (not limited to this scenario). (iii) e locations of all the robots are known. (iv) e probability of the uncertainty is known.
(v) e robots in the state of delay will return to normal.
And after the recovery, these robots will be put into operation in the warehouse system. (vi) e task allocation is not taken into consideration in this paper.
In this paper, letters of the subscripts which are not in the parentheses are used for identification. Letters or numbers of the subscripts which are in the parentheses are used for index. erefore, the problem studied in this paper can be formulated in the following equationfd1:

The Bi-Level Path Planning Algorithm for Multiple Robots
In this section, the bi-level path planning algorithm is introduced. e aim of this structure is that district equilibrium would be taken into consideration in the topology level for reducing the influence of robots under motion uncertainty and distributed path planning in each district can improve system efficiency in the route level even when motion uncertainty occurs. e architecture of the algorithm is shown in Figure 3. It is divided into two levels generating from the map partition: topology level and route level. In the   Complexity 3 topology level, the algorithm plans the order of passing districts. In the route level, the algorithm plans paths for robots in each district separately, considering the motion uncertainty. And the introduction of map partition is as follows.

Map Partition.
In this paper, the discretized map in Figure 1 is partitioned into multiple districts D � d (1) , . . . , d (M) as shown in Figure 3. For the topology level, the generation of the topology map G ′ (V ′ , E ′ ) is listed as follows: (1) each vertex v ′ ∈ V ′ corresponds to district d (s) ; (2) each edge (v (s) ′ , v (s′) ′ ) ∈ E ′ represents that districts d (s) and d (s′) are connected. Let the horizontal edges and vertical edges be E h ′ and E v ′ . For the route level, all feasible ways of each district in the discretized map of Figure 1 are shown in the route map.

Topology Level.
In the topology level, the first step of the planning problem introduced in Section 3 is to plan the passing orders of districts for the robots which have been assigned tasks. At each time step, there might be multiple tasks assigned to robots. e number of these tasks is N and the robots corresponding to these tasks are R t � r (1) , . . . , r (t) (t ≤ n). According to the task t (i) which has been assigned to the robot r (i) , the districts which contain the start position and goal position of t (i) are d (s (i) ) and d (g (i) ) , respectively (here s (i) and g (i) represent the index of the corresponding district). To formally describe the plan results, the passing order of districts for a robot r (i) is a map o (i) : N * ⟶ D.
In this paper, T(T ∈ N * ) is defined as the macroscopical time, which is the independent discrete variable in the topology level. With the increase of T, the robot will move from the start district d (s (i) ) to the goal district d (g (i) ) . Different from the discrete time step, the macroscopic time represents the temporal space and corresponds to the district in topology level. erefore, the change of T represents the transfer of the robot in the topology level. And if districts o (i) (T) and o (i) (T + 1) are different, then the corresponding edge that robot r (i) will pass in the graph G ′ (V ′ , E ′ ) is defined as followsfd2: In this formulation, the edge E (i) (T) represents the transition from o (i) (T) to o (i) (T + 1), which is the moving direction for the robot r (i) . For the robot r (i) which has been assigned the task, there is the corresponding macroscopical time T s (i) and T g (i) when r (i) reaches the start district d (s (i) ) and goal district d (g (i) ) , respectively. en, the planning problem concerned with the districts can be formulated as follows:

Complexity
In this formulation of the optimization objective, L(·) is the length of the edge E (i) (T), which is the corresponding distance between the center of the two districts. Due to the length difference of the edges in the graph G ′ (V ′ , E ′ ), the durations of passing theses edges are different. In our problem, the length of the horizontal edge is twice that of the vertical edge. erefore, the duration of passing the horizontal edge is twice that of the vertical edge. For the other environment map with different layout, the same principle applies. According to this principle, the passing order for the robot r (i) should satisfy the following: Based on this principle, equation (2) can be rewritten as In the optimization objective of equation (3), C(·) represents the crowdedness of the edge E (i) (T) at macroscopic time T(here we assume that o (i) (T) and o (i) (T + 1) are different). And it is defined as follows: , which represents the number of robots in each district at the macroscopic time T. It can be obtained by the passing orders for each robot. Furthermore, these values are changing dynamically since robots are moving. en, the method of normalization can be adopted to calculate NC (o (i) (T)) (T) and NC (o (i) (T+1)) (T). Furthermore, due to the motion uncertainty of robots themselves, robots would not arrive at the planned district precisely. erefore, at eachmacroscopic time, there are multiple robots which are assigned new tasks and need new path planning in current warehouse environment plan, the normalized crowdedness in equation (6) should be recalculated according to the distribution and passing orders of all robots in topology level. To be specific, assuming that one robot is located at district d (s1) and will move to district d (s2) when there are some new tasks and the planning is needed at the macroscopic time T, if the edge corresponding to the transition between d (s1) and d (s2) belongs to E h ′ , then this robot would still stay at this edge at next time (T + 1) in equation (6). If the edge belongs to E v ′ , then this robot would move to next edge at macroscopic time (T + 1). erefore, the optimization objective in equation (3) considers the length of the edges which robots would pass and the crowdedness of the related districts, whose aim is to make sure that the paths of robots would not be too long and the districts which have been planned for these robots would not be too crowded. And this design can guarantee the balance among different districts and efficiency of the whole warehousing system. e constraints listed in equation (3) make sure that the algorithm can plan paths for robots from their start position to the goal position in topology level.
To solve the optimization problem in equation (3), K shortest path planning algorithm [24] is adopted to obtain the multiple shortest paths for each robot which needs to be planned at the moment. According to the planned paths, the passing order can be generated by using equations (4) and (5). e optimization objective can be calculated based on the rules introduced above, and the optimal solution among all the combinations can be selected, which contains the information concerned with the passing order for the robots assigned with new tasks in the topology level. Figure 4 shows an example of the planning results in topology level. And it can be seen that the planning results follow the analysis above; the algorithm in the topology level prefers to choose the paths which are shorter and less crowded since this can minimize the optimization objective mentioned in equation (3).

Route Level.
In the bi-level architecture, when the planning of the topology level has been finished, according to the distribution of the robots, the planning of each district would be conducted for the robots in this district distributedly.
In the route level, the route map is made up of all the feasible ways in each district. And the local route planning is to plan the actions of all the robots in the district being planned.
In the planning of each district, considering the motion uncertainty of robots and the probability p u , the planning step α is introduced that should be intuitively related to p u (α not only refers to the planning step at each planning but also refers to the interval between each two plannings). e aim of this idea is that the planning step can be adjusted dynamically based on p u and some redundant planning in [11,12] would not be needed anymore. Here, we define the planning step as the follows: It can be seen from this definition that there is an inverse relationship between the planning step S and the probability of motion uncertainty p u . When p u gets larger, which means that there is higher probability that robots would not follow the previous planned path, the planning step α becomes smaller and the algorithm would plan paths for robots more conservatively. e first aim of computation for probability of motion uncertainty is to guarantee the real-time performance of the planning in route level since the traditional replanning all the paths from the current position to the goal position. However, the planning step is dependent on the probability of the motion uncertainty. Another aim of this is the planning step is adjusted based on p u , which can prevent the situation that frequent replanning might destroy the optimized planned paths.
In the route level, the planning of each district can be formulated as the following optimization problem: In the formulation of this optimization objective, π (x) represents one policy and there are k policies which are π (1) , π (2) , . . . , π (k) . Each policy contains the information concerned with actions of the robots at each time step in the district.
And for each policy, (1) indicates the policy at the first time step for all the robots in the district being planned. And π (x),(1) � (π (1) indicates the policy at the first time step for the robot r (1) and m is the number of the robots located in the district being planned. And π represent the α step policy of r (1) . E(·) indicates the expectation of the reward which is similar to that of MDP. e aim of designing this objective function is that this can obtain the probabilistic optimal solution since there exists motion uncertainty of robots. λ(0 < λ < 1) is the discount factor. λ has relationship with p u as equation (9). e aim of equation (9) is that the discount factor can be adjusted dynamically based on and when is high; the long-term reward is not reliable anymore and the path planning based on MDP is concerned more on the optimization of total short-term reward.
S (1) , . . . , S (α) represents the state from the start time step when the current district needs to be planned to the time step which is (α − 1) time steps after the start time step. And for S (1) , it contains states for all the robots in the district, which means that S (1) � (s r (1) (1) , s r (2) (1) , . . . , s r (m) (1) ). e state s r (1) (1) indicates the state for the robot r (1) after the action π r (1) (1),(1) is taken. R(·) defines the reward for each state based on the distribution of the robots in the corresponding district.
erefore, according to the Markov decision process in [8], the optimization objective in equation (8) implies the expectation of the rewards for the state from the start time step to after (α − 1) time steps. And it can be rewritten as follows: where S (q) ′ indicates the possible state after the action π (x),(q−1) is taken at the state S (q−1) since there exists the motion uncertainty of robots, R(S (q) ′ ) represents the reward of the robots in this district at the state S (q) ′ , and P S (q−1)π (x),(q−1) (S (q) ′ ) is the probability that the state of robots in this district would turn to S (q) ′ when the action π (x),(q−1) is taken at the state S (q−1) . e probability can be calculated as follows: where s r (j) ′ (q−1) is the state of the robot r (j) after the action π r (j) (x),(q−1), is taken at state s r (j) (q−1) and P s r (j) is the probability that the state of the robot r (j) will change into s after the action π r (j) (x),(q−1) is taken at state s r (j) (q−1) . And this probability is dependent on the motion uncertainty and can be represented as follows: For the reward R(S (q) ′ ) of the policy π (x) , it is related to the crowdedness of the district and the distances among robots in the district, which can be defined as follows: where Cd (x) ′ is the normalized crowdedness of the district being planned when the policy π (x) is taken. And the crowdedness Cd (x) of the district being planned can be calculated as follows: where distance (i)(j) is the distance between r (i) and r (j) at state S (q) ′ . en, the crowdedness Cd (x) needs to be normalized by using the principle of the softmax according to the following formula: In equation (13), value(s . It is defined as the distance between the robot r (h) and the next district which r (h) heads to since the passing order has been planned in the topology level. e rule for calculating this distance is based on the Manhattan distance.
To avoid the conflicts among multiple robots, the factor ε is introduced. And ε in equation (13) is defined as follows: It can be seen from equation (16) that when there would be a conflict between robot r (i) and r (j) , ε is set at ∞. When a robot would move to the location at next time step where another robot is staying at for now, then ε is set at e 1/(1− p u ) . Otherwise, ε is set at 1.
According to equations (13) to (16), the first aim of this design is to prevent that each district would be too crowded since this would cause the coordination space to be too small, which means that once there happens motion uncertainty in some robots, it would be hard for the algorithm to find the solution and the system efficiency would be decreased. Another aim for the set of ε is to prevent the conflicts and make sure that one robot would not choose the action which might cause it to be next to another one. And this is beneficial for the motion coordination of robots since the motion uncertainty of one would cause it to still stay at the previous location, which increases the probability of the conflicts. erefore, this setting can decrease the probability of the situation like that.
To solve the optimization problem in equation (8), the policy set π needs to be generated in the district and the policy with the minimum objective would be selected as the solution. And the principle for the candidate action set of the policies is that robots would not be far away from the next district after taking this action. ere are two benefits for this principle. Firstly, the farther the distance from the next district, the larger the objective function in equation (8). Secondly, due to the introduction of this principle, the candidate actions will decrease, which can in turn lower the computational complexity. According to above principle, there are two kinds of candidate action sets when robot is in different locations of each district: (1) When robot is in any location of district and not coming into the boundary, the candidate action set of robot is denoted as left, up, still . e action left means that robot drives to the other side of two-way street at next time step, up means that robot follows current side of street, and still means that robot stays still at next time step. For example, in Figure 5, robots r (1) and r (3) are in the district and not coming into boundary at next time step. erefore, their Complexity candidate action set includes left, up, and still as orange arrows show. And as shown in Figure 5, the boundary between two districts is defined as an area composed of four grids, and one half of the boundary belongs to one district and the other belongs to the neighboring district. Considering that path planning in the route level is conducted separately in each district, robot should only follow one direction (drive to the right) in the process of traversing the boundary to the neighboring district, for avoiding the conflict. (2) When robot is in the boundary of district or coming into the boundary, the candidate action set of robot is denoted as up, still or left, still { } according to the rule that robot should only drive to the right in the process of traversing the boundary. For example, robot r (2) in Figure 5 is coming into the boundary at next time step and its candidate action set is left, still { } which means that it needs to drive to the right side of two-way street firstly. r (4) and r (5) are in the boundary and their candidate action set is up, still .
According to the above candidate action set, the generation process of robot r (i) and policy π r (i) (q) is shown in Figure 6. And it can obtain multiple policies for each robot. en, π can be generated with the combination of multiple policies of each robot in the district.

Conflict Avoidance.
In this paper, we propose a bi-level path planning algorithm. e architecture of the proposed method is divided into topology level and route level. And in the topology level, the passing order of districts is planned for robots. ere is no conflict in topology level because the districts are allowed to traverse for multiple robots at the same time. erefore, the conflict avoidance of the proposed algorithm in the paper mainly depends on whether or not it could plan conflict-free path in route level. And we try to prove it in the following proposition.

Proposition 1.
e path planning in route level could plan conflict-free paths for all robots in the district.
Proof. In route level of the proposed method, the path for robots in the district is planned for many times and α steps with MDP at each time. erefore, we try to use mathematical induction to prove the conflict-free path can always be planned by the proposed method in the district at each time. And we firstly describe some symbols of some variables to facilitate the expression of proof.
For any district d (s) ∈ D with m robots, the route map can be denoted as G p (V p , E p ). And robots in the district can be denoted as R p � r (1) , . . . , r (m) . n re represents the n re th path planning in each district. In the n re th planning: the beginning of time step is denoted as t n re , and the planning step is α. e solution with MDP is π n re and planned path of π n re for any robot r (i) is [p (i) (t n re ), p (i) (t n re + 1), . . . , p (i) (t n re + α)]. e proof detail is as follows: (a) Firstly, when n re � 1, prove that the path planning is conflict-free: At the first time of path planning, robots are leaving the boundary in each district. ese robots of each district are heading to different directions and located in the different sides of two-way street in the map. erefore, there always exists one solution π 1 that makes sure that all robots go on with their current side of two-way street in the α time steps and no robots from the opposite direction run in the same side of two-way street. And the planned paths for robots meet equation (17). It can be concluded that reward of the above solution π 1 according to equation (16) will not be ∞ and π 1 could be feasible solution of the optimization objective, which could prove that the proposed method could plan conflictfree path for all robots when n re � 1.
p (i) (t) ≠ p (j) (t + 1) and p (j) (t) ≠ p (i) (t + 1), ∀i, j ∈ R p , t n re ≤ t ≤ t n re +1 , (b) Secondly, when n re � k 1 , suppose that the planned path in the k 1 th planning is conflict-free and prove the planned path in the (k 1 + 1)th planning is also conflict-free: Green squares represent the robots in the district. e blue arrow indicates the next district where these robots are heading to. erefore, the orange arrows for each robot represent the candidate actions they can take. Besides the actions represented by these orange arrows, staying still is also included in the candidate actions for each time step.

Complexity
Considering that the path planning in the k 1 th planning is conflict-free, the ending location p (i) (t k 1 + α) of planned path for any robot r (i) ∈ R p satisfies equation (17), which means that there does not exist the situation of direction conflict and location. On the other hand, the ending location p (i) (t k 1 + α) of the k 1 th path planning is equivalent to the beginning location p (i) (t k 1 +1 ) of the (k 1 + 1)th path planning. If there exists v1 i , v2 i ∈ R p where v1 i represents the next location from with the action of going up and v2 i represents the next location from p (i) (t k 1 +1 ) with the action of going left, equation (18) can be concluded. It means that each robot r (i) can go left or up without conflict with other robots. erefore, it can construct a candidate solution π k 1 that makes robots heading to the same direction run in the same side of two-way streets in the α steps of path. And when robots heading to the different direction run in the same side of two-way street, the planned path will not have conflict with other robots. erefore, it can be proved that the planned path in the (k 1 + 1)th planning is conflict-free.
and when robots go to the boundary in the last time planning, considering that the robots always follow one direction when they are coming into boundary or traversing on the boundary, the district-distributed way of the proposed method in route level will not have conflict between robots from different districts.
To summarize, there always would be conflict-free solution for path planning in the route level with the proposed method. Optimization. For a multirobot system with each robot subject to uncertainty, directly computing the optimal solution is very difficult and it must recompute many times during the running of robots. erefore, it is hard to directly prove the optimization or near optimization. In this section, we will discuss the optimization of the proposed method. Considering that the architecture of proposed method in this paper is divided into two levels and paths are planned separately, we discuss the optimization of proposed method from following two parts: (1) For path planning in topology level: in the topology level, the optimal orders of districts for all robots can be obtained and the reason is as follows. Firstly, K shortest path planning algorithm [24] is adopted to obtain the multiple orders of passing districts for each robot. And combinations of orders of passing districts among robots can be generated. When K is large enough, all possible combinations of orders of passing districts can be obtained. Secondly, equation (3) is put up with in the topology level, which aims to minimize the corresponding distance between the center of the two districts and total crowdedness between districts among all the robots assigned with new tasks. And above optimization objective can be calculated based on all possible combinations of orders of passing of districts for robots. (2) For path planning in route level: the proposed method takes path planning for all robots in each district for many times and plans paths of α steps for the robots at each time until all robots leave the district. So, it is difficult to directly prove the optimization. In this part, we compare above method with existing method in [8] to discuss the optimization of the proposed method. Similar to the proposed method, the existing method also takes path planning in each district for many times. e difference is that at each time planning, the existing method plans the path of static α global steps with the MDP method and executes the first α steps of them. e optimization objective is shown in equation (19). α global can make sure that the destination of planned path is the location where robots are leaving from the current district. In the existing method in [8], α is also intuitively related to p u following equation (7). e existing method in [8] has proved the optimization.
For discussing the optimization of proposed method, we try to discuss the similarity of first α steps for solution at each time planning compared with existing method in [8] in the following two situations: Complexity 9 (a) When the motion certainty probability p u is low (p u ≤ 0.2): In this situation, α is close to α global according to (7). And it is obvious that the first α steps of solution from the proposed method and existing method in [6] are close to the same. For example, Figure 7 shows a situation where three robots are located at different states, and Table 1 shows the different solutions of path planning for the situation in Figure 7 using above two methods when p u is different (α global is set to 10 for the existing method in [8]). As shown in Table 1, when p u is 0.1 or 0.2, first α steps of the proposed method are the same as the existing method in [6]. (b) When the motion certainty probability p u is high (p u > 2): In this situation, consider that both the proposed method and existing method in [8] choose the MDP method where the discount factor λ exists in optimization objective and has relationship with p u according to equation (9). erefore, when p u is high, λ α in (19) will be lower. is means the last (α global − α) steps of solution have little influence on the computation for equation (19). And in this situation, the first α steps of the proposed method are close to the existing method in [6]. For example, in Table 1, when α is close to α global according to (19), it is obvious that the solution from the proposed method is close to first α steps of solution from the existing method in [8]. Figure 7 shows a situation where three robots are located at different states, and Table 1 shows the different solutions of above two methods when p u is different. As shown in Table 1, in the example of Figure 7, when p u is 0.4 or 0.5, although the solution of above two methods is not exactly the same, the solution from the proposed method is close to the first α steps of solution from the existing method in [8].

Simulation Result and Analysis
In order to evaluate the proposed method, simulations were implemented under the Matlab environment with Intel 3.60 GHz Core i7-4790U CPU and 8G RAM. e map of warehouse is shown in Figure 1. e motion uncertainty p u ranges from 0 to (1/2) to simulate the situations under different motion modes. e number of robots is set at 10, 20, 30, 40, and 50. e number of tasks is 1000. e makespan, number of planning steps, and computation time are contrasted with those of the existing method in [21,22] under a different number of robots. And for the existing method in [21,22], the probability corresponding to the disturbance intensity whether during the following second the robot will be prevented from moving is set to p u and the Step 2 Step α π (q) Figure 6: Generation of policy π r (i) (q) for r (i) in each district. Assuming that the candidate actions in each district include left, up, and still, the generation of policy π r (i) (q) for robot r (i) is shown in this figure. In this tree structure, different level of branches corresponds to different steps. Each circle represents the action for r (i) in the district at the corresponding time step. For example, the action left means that all the robots in the district should go left. A complete policy π r (i) (q) from the first time step to the αth time step is illustrated in the blue box. Multiple policies for each robot can be generated from this tree structure. lower bound which represents the best possible travel time is set to the shortest distance from start to end for each robot. e average of 20 repeated simulations is shown in Tables 2-4. e tasks have been generated before simulations. Table 2 shows the simulation results of average makespan of the existing method and the proposed method. Standard deviations are also shown in Table 2. Decrease percentage of makespan (the last column in Table 2) is calculated according to the following formula:

Makespan.
where makespan 1 and makespan 2 represent the makespan of the existing method in [21,22] and the proposed method, respectively. Graphical summaries of simulation results of makespan are shown in Figure 8.

Average Time.
Compared to the existing method in [21,22], it can be seen that the proposed method in this paper effectively decreases the makespan. When the number of robots is relatively small, the decrease percentage is low because the robots are located sparsely in the warehouse environment and the motion uncertainties of robots would not influence too many other normal robots, which means that most of the normal robots could still follow the previous planned paths. As the number of robots increases, the decrease percentage of makespan is greatly improved because the number of influenced robots would increase further, which means that the proposed method would replan to find more optimal paths based on the probability p u for more robots to avoid robots which are in the motion uncertainty. Under this circumstance, by combining MDP, redundant waiting of the method in [21,22] is not necessary. When the value of p u increases, the proposed method could still improve the system efficiency and has a smaller variance compared to the existing method in [21,22]. e percentage of decrease can be up to 27.9% when the number of robots is 50 and p u is 0.5.

Variance.
Compared to the existing method in [21,22], the variance of the proposed method is smaller. Furthermore, with the increase of the number of robots, the variance in the results of the proposed method increases less than that of the existing method in [21,22]. e comparison between the results of the proposed method and the existing method in [21,22] means that the proposed method has less randomness and is more stable. e reason is that the method in this paper would optimize paths of all the robots depending on the probability p u .
In the proposed method, unlike the existing method in [21,22] that would solve the problem of motion uncertainty through a waiting strategy, the proposed method to coordinate paths of robots under motion uncertainties adopts the strategy of replanning new paths to choose the probabilistic optimal solution. erefore, when motion uncertainty happens, all the normal robots would follow the paths that could reach the destination as soon as possible from the global view instead of adopting the strategy of going forward and waiting.
erefore, compared to the existing methods in [21,22], the decrease of the efficiency induced by the motion uncertainties could be lowered. In this way, the efficiency of the whole system could be improved when motion uncertainties happen by adopting the proposed method. Compared to the existing methods, the efficiency could be increased, which is beneficial for the overall running of the system.

Number of Planning
Steps. Percentage of planning (listed in Table 2) is calculated according to the following formula: where N re and makespan represent the number of planning and the makespan of corresponding method, respectively. e makespan of each method is given in Table 2. Graphical summaries of simulation results of planning percentage are shown in Figure 9. It can be seen from Table 3 that the percentage of planning of the proposed method is much more smaller than that of the method in [21,22]. It has to plan at each time step since the method of [21,22] is a kind of reactive method. erefore, the percentage is 100%. On the contrary, the proposed method in the route level would only plan when robots have finished the path of planning step α or when the motion uncertainty happens. erefore, the percentage of planning would decrease, compared with the existing methods in [21,22].
Another conclusion we can get from Table 3 is that when the probability p u is small, the percentage reduction compared with the method in [21,22] is larger and when the probability p u increases, the percentage reduction becomes smaller. e reason for this is that when p u is small, there would not be too many time steps when motion uncertainties happen, which means that there would not be too many time steps when planning is needed. erefore, the percentage of planning is relatively small, which causes the percentage reduction to be larger. When p u becomes larger, the time steps when motion uncertainties happen become more, which causes more planning. erefore, the percentage of planning increases, which causes the percentage reduction to be smaller. However, it is noteworthy that even if the percentage of planning becomes larger when p u increases, the percentage of the proposed method is still fewer than that of the method in [21,22]. Table 4 shows the average of the computation time under different p u . e statistics in the table demonstrate that the proposed method could satisfy the requirement of the real-time performance. e reason that the proposed method is higher than that of the existing method in [21,22] is that the existing method only plans for the next time step while the proposed method needs to plan the whole path to the destination. erefore, the computation of the proposed method would be larger than that of the existing method. However, as shown in Figure 10, the trend of computation time is more close to linear relationship for the proposed method and increasing speed will reduce slowly with robot number increasing. On the other hand, even if the computation time is somewhat larger, the proposed method could still plan conflict-free paths for all the robots in real time, which can be applied for the lifelong scheduling in warehouses. e real-time performance of the bi-level probabilistic path planning algorithm for multiple robots with motion uncertainty is guaranteed by the bi-level planning architecture and the distributed planning in each district. e topology level would plan the passing orders of districts for all robots assigned with tasks and the path planning is conducted in each district for route level. e planning in each district only needs to take the actions of the current district being planned into consideration. erefore, the    number of planning depends on p u , which means that the computation is adjusted by p u .

Conclusion
For improving the system efficiency when there are motion uncertainties among robots in the warehouse environment, this paper proposes a bi-level probabilistic path planning algorithm. In the proposed algorithm, the map is partitioned into multiple interconnected districts and the architecture of proposed algorithm is composed of topology level and route level generating from above map: in the topology level, the district crowdedness is taken into consideration to optimize the passing orders of districts and achieve the district equilibrium; in the route level, path planning is conducted in each district with a MDP-based probabilistic method, where the planning step is related to the probability of the motion uncertainty to improve the efficiency and guaranteeing the real-time performance.
Furthermore, the conflict avoidance of designed algorithm is proved and optimization is discussed. e simulation results validate that the designed algorithm can improve the system efficiency and the percentage can be up to 27.9% in the designed warehouses environment of this paper, which also accords with optimization discussion. It can be also seen that the number of planning has been reduced to a large extent.
Our future work will involve the further optimization of paths by taking the interaction of flow among different districts into consideration since the actions are only obtained by the distributed planning in each district. Another aspect is that the real experiments are needed to verify the effectiveness of the proposed method.
Data Availability e source code and simulation data of the proposed method and compared method in [18,19]   ey can also be found at https://github.com/SourceCode2020/Bi-level_Probabilistic_Path_Planning_Algorithm.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.