A Differential Game Approach to Multi-agent Collision Avoidance

A multi-agent system consisting of <inline-formula><tex-math notation="LaTeX">$N$</tex-math></inline-formula> agents is considered. The problem of steering each agent from its initial position to a desired goal while avoiding collisions with obstacles and other agents is studied. This problem, referred to as the <italic>multi-agent collision avoidance problem</italic>, is formulated as a differential game. Dynamic feedback strategies that approximate the feedback Nash equilibrium solutions of the differential game are constructed and it is shown that, provided certain assumptions are satisfied, these guarantee that the agents reach their targets while avoiding collisions.


I. INTRODUCTION
The state of multi-agent systems is a fast-emerging field in control engineering [1]- [3]. One of the main motivations behind this area of research is that a team of "simple" agents collectively can perform "complex" tasks. Many areas of applications exist for such multi-agent systems. Typically the agents are expected to solve a task collaboratively or maintain certain positions relative to one another. Often the terms collaborative control, cooperative control, and formation control are used to describe such problems [4]- [16]. In the context of formation control, most of the proposed approaches are based on the notion of navigation function-introduced by Rimon and Koditschek in [17] in the case of single agents-which is constructed from the geometric information on the considered topology and then employed to define gradient descent control laws. This concept has been recently extended to the multi-agent scenario, both in a centralized [18], [19] and decentralized [20], [21] implementation. In [6], [7], the problem of continuously monitoring a region using a team of unmanned aerial vehicles has been formulated as a differential game for which approximate solutions have been found using the methodology developed in [22]. Many research topics within the area of multi-agent systems are inspired by naturally occurring systems, such as schools of fish, migrating birds, and swarms of bees [23]- [29].
Although it is common to study problems in which the agents in a multi-agent system solve a task collaboratively, there are scenarios M. Sassano is with the Dipartimento di Ingegneria Civile e Ingegneria Informatica, Università di Roma "Tor Vergata", Roma 1 00133, Italy (e-mail: mario.sassano@uniroma2.it).
Digital Object Identifier 10.1109/TAC. 2017.2680602 in which the agents have individual, and possibly conflicting, goals. Differential game theory introduces a framework to study problems in which several players seek to attain individual goals, which may or may not be competing [30]- [33]. It therefore appears natural that differential game theory can be useful to study and solve problems involving multi-agents systems [34], [35].
In this paper, we consider a team of mobile agents. We focus on the problem of controlling these agents from their given initial positions to a set of predefined targets while avoiding collisions with static obstacles as well as collisions with other agents. This problem is referred to as the multi-agent collision avoidance problem. Preliminary results have appeared in [36]. The game introduced herein is a nonlinear differential game for which feedback Nash equilibrium solutions are sought. However, since obtaining such solutions relies on solving a set of coupled partial differential equations (PDEs), for which closed-form solutions are not readily available, it is necessary to settle for approximate solutions. In [37]- [39], two methods for constructing dynamic feedback strategies for a class of nonlinear differential games have been developed. Using the machinery developed in [39], we construct dynamic feedback strategies, which approximate the feedback Nash equilibrium solution of the differential game describing the multi-agent collision avoidance problem. Furthermore, we show that, subject to certain natural assumptions being satisfied, these strategies guarantee that all agents reach their targets while avoiding collisions with obstacles or other agents. The method allows us to systematically construct a Lyapunov function yielding local stability and asymptotic convergence of the agents to the desired targets. This constructive result is achieved in two steps. First, we define a matrix-valued function, for each agent, which is similar in spirit to the definition of a standard navigation function. This function is modified by the presence of additional dynamics and the resulting value functions are smooth, hence yielding smooth control laws. The proposed differential game formulation endows the value functions with an interesting property: Given the initial configuration of the agents, evaluating these functions allows us to assess a priori the performance, individually for each agent, of the control strategy (in terms of distance from obstacles or interagent collisions during the entire movement). In addition to providing a novel perspective of the collision avoidance problem, the differential game approach adopted in this paper paves the way to several extensions in relation to control of multi-agent systems, such as the incorporation of multiple simultaneous objectives and control design under communication constraints (see, for example, [40]).
The remainder of the paper is structured as follows. The multiagent collision avoidance problem is introduced and formulated as a differential game in Section II. In Section III, the solution to the problem is presented. Finally, simulations illustrating the theory are presented in Section IV before some concluding remarks are provided in Section V.

II. PROBLEM FORMULATION
In this section, the multi-agent collision avoidance problem is introduced and formulated as a differential game. The problem is studied in This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ a centralized framework, in which the positions of each agent are available to the remaining members of the group at all times. We consider a team of N agents moving on the ground (Euclidean plane), possibly characterized by the presence of (static) obstacles. In particular, each agent is described by single-integrator dynamics, i.e., is the control input of the ith agent and the position of the ith agent is denoted by x i ∈ R 2 . Note that x i and u i represent the position and the velocity of the ith agent on the Euclidean plane, respectively. Suppose that each agent is associated with a desired goal, namely a target position x * i ∈ R 2 , i = 1, . . . , N . Moreover, letx i denote the error variable between the current position of the ith agent and its corresponding target position, i.e., The problem then consists of steering each agent from its initial position to its goal while avoiding collisions. Each agent i is associated with a parameter r i > 0, which plays the role of safety radius. Since the team may consist of heterogeneous agents, e.g., they may have different sizes or shapes, individual values for the safety radius may be associated to each agent. Suppose that there are m ≥ 0 static obstacles and let p c j ∈ R 2 and P j ⊂ R 2 , j = 1, . . . , m, denote the center of mass of the jth obstacle and the region of the Euclidean plane that it occupies, respectively. The standard notation ∂P j is employed to denote the boundary of the region P j . In what follows elliptical obstacles are considered, i.e., where ρ j > 0 and E j = E j > 0. There is a one-to-one relation between the point p j , ρ j , and E j , and the physical parameters of the ellipse, i.e., the center of mass p c j and the lengths of the semiaxes. This one-to-one relation transforms the description of the ellipse in (2) into the canonical representation of the ellipse. For the case in which the obstacle is circular E j = I, p j is the center of mass and ρ j is the radius of the circle.
Remark 1: If a static obstacle is not elliptical, possibly even in the presence of nonsmooth edges, it is possible to enclose the obstacle within an ellipse, thus smoothing the obstacle. This can be achieved by exploiting the notion of geometric moments of the portion of the Euclidean plane that constitutes the obstacle [41], [42]. In fact, the moments up to order 2 are related to the geometric parameters of the smallest ellipse that contains the region of interest, see e.g., [42].
The ith agent is guaranteed to avoid collisions with the jth obstacle if it does not cross the boundary ∂P j . We define the obstacle avoidance region and collisions between an agent and a static obstacle as follows.
The obstacle avoidance region, denoted by S, is defined as S = ∪ m j = 1 S j . Definition 2: A collision between the ith agent and a static obstacle is said to occur if there exists a time instantt ≥ 0 such that x i (t) ∈ S. The ith agent is said to collide with the jth obstacle if there exists a time instantt ≥ 0 such that x i (t) − p c j 2 ≤ (r i +ρ j (φ(t))) 2 , where 1 ρ j (φ) denotes the radius of the ellipse P j in polar coordinates as a function of the angle φ of the segment connecting x i (t) and p c j , relative to the polar description of p c j , i.e., (p c 0 ,j , φ 0 ). In addition to avoiding collisions with static obstacles, each agent should avoid collisions with other members of the team by maintaining 1 Given an ellipse P j , the functionρ j (φ) can be computed by straightfor- , where a and b denote the major and minor semiaxis of the ellipse, respectively, and φ a is the rotation of the major semiaxis relative to φ 0 . a sufficiently large distance between itself and the other agents. From the perspective of the ith agent, the remainder of the agents, i.e., j = 1, . . . , N , j = i, can be considered as dynamic obstacles. Bearing this in mind, the agent avoidance region of the ith agent may be described by mimicking and adapting the ideas of Definitions 1 and 2.

Definition 3: Given a time instantt ≥ 0, consider the open sets
The agent avoidance region of the ith agent att, denoted Dt i , is defined as A collision between two agents may be now defined.
LetDt i denote the complement of the set Dt i and similarly letS denote the complement of S. Then, a collision-free trajectory for the ith agent is defined as follows.
Remark 2: Definitions 3, 4, and 5 are provided for simplicity considering circular geometries around each agent, namely the sets Dt ij are circles centered at x i . Different and more complex geometries can easily be accounted for by modifying Definitions 3 and 4. The Remark 3: The presence of dynamic obstacles may be allowed by mimicking the definitions concerning collisions between two different agents.
In what follows, we assume that sufficient time is provided to accomplish the task of steering each agent from its initial position x i (0) to its corresponding target position x * i . In a more practical scenario, in which a sequence of desired target positions x * ,k i , k = 1, 2, . . ., is assigned to each agent, this assumption requires that the rate at which the sequential tasks are assigned is sufficiently slow to let the agent accomplish the previous task, or at least to be steered arbitrarily close to the desired target position. According to the above discussion the multi-agent collision avoidance problem can then be formulated as an infinite-horizon, noncooperative, nonzero-sum differential game, as detailed in the following definitions. This formulation allows us to simultaneously deal with the primary goal of reaching the desired position x * i and the secondary, though unavoidable, objective of avoiding collisions.
Problem 1: Consider a multi-agent system consisting of N > 1 agents with dynamics (1), for i = 1, . . . N . The multi-agent collision avoidance problem consists in determining feedback control strategies u i , i = 1, . . . , N , that steer each agent from its initial position to a predefined target while avoiding collisions.
Problem 1 can be recast in the framework of differential games as done in the following statement.
Problem 2: Consider a multi-agent system consisting of N agents with dynamics (1), for i = 1, . . . N , and letx = x 1 , . . . ,x N , such thatẋ where respectively. Remark 4: The differential game formulation has been preferred to an optimal control approach for several reasons. First of all, in the latter case a single value function is sought for, hence providing a cumulative index of performance for the entire group of agents, whereas in the game theory scenario an individual value function is associated to each agent, thus allowing for a more detailed analysis of the effectiveness of the derived solution. It can also be shown that the control law obtained with the game theory approach is a solution according to the notion of feedack Stackelberg equilibrium, in addition to that of feedback Nash equilibrium. This implies that, should the planning be performed, for any reason, sequentially for each agent, e.g., in the presence of delays, the solution proposed in this paper remains an equilibrium solution to the collision avoidance problem [11], [43]. This feature is particularly appealing when considering an extension towards a decentralized implementation of the approach.
The functions g s i (x) and g d i (x) are barrier functions penalizing the ith agent from approaching the static obstacles or other agents, respectively, hence can be considered as obstacle collision avoidance and agent collision avoidance functions, respectively. In the following, inverse barrier functions are considered for g s i and g d i , namely with c > 0. Note that alternative definitions for the two functions are possible (see, for example, [7], [44]). A control design approach to the multi-agent collision avoidance problem then consists in determining the feedback Nash equilibrium strategies for each player, namely the set of strategies (u * 1 , . . . , u * N ) satisfying for all u i = u * i , i = 1, . . . , N and rendering the zero-equilibrium of the closed-loop system locally asymptotically stable. This, in fact, ensures that each agent reaches its target position without entering its obstacle avoidance and agent avoidance regions.
Remark 5: The inequalities (7) describe the Nash equilibrium solution of the differential game. The so-called -Nash equilibrium solution [39] is an approximate solution to the problem. This is the set of strategies u * 1 , . . . , u * N , which is such that the zero equilibrium of the closed-loop system is asymptotically stable and guarantees that if one agent deviates from its -Nash equilibrium strategy, its gain is bounded from above by a constant > 0, i.e., the set of strategies satisfies the inequalities . . , u * N ) + , for some > 0, where u i = u * i and the set of strategies u * 1 , . . . , u i , . . . , u * N is such that the zero equilibrium in closed-loop is asymptotically stable for i = 1, . . . , N .
Remark 6: In [39], an alternative definition of approximate solution for a differential game has been introduced. Suppose that the set of strategies u * 1 , . . . , u * N renders the zero equilibrium (locally) asymptotically stable. The set of strategies is then said to be an α -Nash equilibrium solution for the differential game if

III. MULTI-AGENT COLLISION AVOIDANCE
In this section, we discuss the control design technique proposed to solve the multi-agent collision avoidance problem. Since Nash equilibria for the differential game introduced in Problem 2 cannot be easily obtained, a systematic method for constructing feedback control laws, which satisfy partial differential inequalities (PDIs) (instead of equations), leading to α -Nash (instead of Nash) equilibria is provided. The method requires only the solution of matrix algebraic inequalities, which is provided in closed-form. It is shown that the constructive design methodology, which leads to approximate solutions of the differential game in Problem 2, yields a solution to the original Problem 1.

A. Hamilton-Jacobi-Isaacs (HJI) PDIs and -Nash Equilibria
The HJI PDEs associated with the differential game described by the cost functionals (4) and the dynamics (1) for i = 1, . . . , N , must be considered toward the construction of Nash equilibrium strategies, i.e., individual value functions V i : R 2 N → R, i = 1, . . . , N , satisfying the coupled nonlinear PDEs with V i > 0 and V i (0) = 0, i = 1, . . . , N , must be found [30], [31], [45]. Provided a solution to the PDEs (8) can be determined, the Nash equilibrium strategy of the ith agent is given by Equation (8), i = 1, . . . , N , do not readily admit closed-form solutions and it is of interest to settle for an approximate solution of the differential game. In what follows, it is shown that an approximate solution to the problem (in terms of -or α -Nash equilibrium solutions) can be determined in a systematic manner by considering the immersion of the original dynamics in a higher-dimensional state space. It can be shown, see, e.g., [39], that α -Nash equilibrium solutions are related to PDIs. Toward this end, consider the HJI PDIs lision avoidance problem, i.e., collision-free motion is achieved while the agents maneuver to reach their goals. In fact, the set of strategies constitutes a local α -Nash equilibrium solution for the differential game described in Problem 2. However, solving the PDIs (10) may still be a daunting task to accomplish. Thus, in the Section III-B, we show how a solution to the inequalities (10) can be systematically constructed in an extended state space by relying merely on the solution of a system of matrix inequalities.

B. AlgebraicP Matrix Solutions
In this section, a procedure to systematically construct a set of dynamic strategies solving the PDIs (10) instead of the HJI PDEs (8), i = 1, . . . , N , is presented. The method, introduced in [37]- [39], relies on the notion of an algebraicP matrix solution and a dynamic extension ξ, which is common to all agents.

Exploiting the notion of algebraicP matrix solution define the functions
with ξ ∈ R 2 N , where R i = R i > 0, i = 1, . . . , N , which are locally positive definite around the origin for any R i . The partial derivatives of V i (x, ξ) are given by where and Ψ i (x, ξ) is the Jacobian matrix of the mapping 1 2 P i (ξ)x. We recall one of the main results of [39]. The following result characterizes the properties of the extended value functions V i in (13).
Theorem 1 ( [39]): Consider the system (1) and the cost functionals (4). Let P i , i = 1, . . . , N , be algebraicP matrix solutions of (8). Then, there existk ≥ 0, R i = R i > 0, i = 1, . . . , N , and a neighborhood Ω ⊆ R 2 N × R 2 N of the origin such that the dynamic strategies (15) satisfy the inequalities for all (x, ξ) ∈ Ω and for all k >k. The dynamic feedback strategies (15) are such that the trajectories of the system (1)-(15) asymptotically converge to the origin and constitute an α -Nash solution for the differential game described in Problem 2.
Theorem 1 entails that obtaining a solution to Problem 2 boils down to determining algebraicP matrix solutions, i.e., matrix-valued functions satisfying (11). Using algebraicP matrix solutions dynamic control strategies are designed which, by construction, satisfy the PDIs (16) locally. Note that the method does not necessitate solving the PDEs (8) or the PDIs (16) directly.

C. Feedback Design Methodology
It is assumed that the following conditions are satisfied by the initial configurations of the agents.
Assumptions 3 and 4 are such that the goals for the agents, i.e., x * i , i = 1, . . . , N , are feasible, namely the target positions do not force collisions with obstacles or between agents. Finally, it is assumed throughout the paper that the static obstacles do not form an impermeable boundary about targets of one or more of the agents. Without this assumption the problem is infeasible. In the following the notation A = [A ij ] is used as a shorthand for the block matrix Consider now the dynamic extension ξ = [ξ 1 , . . . , ξ N ] ∈ R 2 N , where ξ i ∈ R 2 , i = 1, . . . , N , introduced in the previous section and the matrix-valued functions P 1 (x), . . . , P N (x), with P i (x) ∈ R 2 N ×2 N , i = 1, . . . , N , given by where P i k j ∈ R 2 ×2 , k = 1, . . . N , j = 1, . . . , N and γ i > 0 is a constant parameter, and P i k j = 0 for k = i and j = i. Define the set M = {ξ ∈ R 2 N : g s i (ξ) + g d i (ξ) < ∞}. Note that the functions V i in (13) are positive definite for all (x, ξ) ∈ R 2 N × M. Consider, in addition, a partition of the matrix R i as . . . , N , j = 1, . . . , N . Adopting the above notation the following theorem shows that the functions P i , i = 1, . . . , N , defined in (17)- (18) constitute algebraicP matrix solutions of (8) and consequently that the dynamic control laws (15) with V i as in (13) and P i as in (17)-(18) solve the multi-agent collision avoidance problem.
Similarly to Theorem 1, the following result hinges upon the existence of a certain compact set Ω, on which it is first shown that (16) holds point-wise with respect to x. Then, such a property is employed to make claims on the closed-loop trajectories, as functions of time, that do not leave such set.
Proof: The proof consists of two steps, the first one to show that the matrices P i (x), i = 1, . . . , N , constitute an algebraicP matrix solution for the differential game associated to Problem 2. It then follows that the dynamic control strategies (19), i = 1, . . . , N , solve the inequalities (16) and thus constitute an α -Nash equilibrium solution for the differential game [37]- [39]. Second, it is shown that provided Assumptions 1-4 are satisfied, the agents converge to the desired target while avoiding collisions.
It is evident that the initial condition of the dynamic extension ξ(0) is of importance for the solution of the differential game, namely to ensure that the trajectories of the dynamic extension do not leave the set M. Toward this end a reasonable criterion for such selection is to let ξ(0) be such that g i (ξ(0)) is bounded for all i = 1, . . . , N , i.e., ξ(0) ∈ M. The following result shows that this choice is in fact sufficient to show that the trajectories do not leave M since such a set is positively invariant with respect to the dynamics (1)- (19).
Proposition 1: Suppose Assumptions 1 and 2 are satisfied. If the initial condition of the dynamic extension is selected such that ξ(0) ∈ M it follows that (g s i (ξ(t)) + g d i (ξ(t))) < ∞ for all t > 0, which implies ξ(t) ∈ M for all t > 0, i.e., the set M is positively invariant.
Remark 7: It is easy to imagine situations in which a deadlock between agents could occur: Intuitively "symmetric" scenarios could end in a deadlock, as seen in [18]. Whereas approaches using (static) navigation functions typically are convergent almost everywhere, the approach adopted herein ensures local convergence, since W is a Lyapunov function showing local asymptotic stability of the origin of the extended state (x, ξ), thus eliminating the presence of saddle points (causing deadlocks) in a neighborhood of the equilibrium.
Remark 8: Although the collision avoidance functions (6), for i = 1, . . . , N , are unbounded when the denominators in (6) are zero, the closed-loop system (3)- (19) is such that, provided the denominators are greater than zero initially, they remain greater than zero for all time. Thus, the dynamic control strategies (19) are bounded at all times. This is a direct consequence of Proposition 1. Namely, x(0) and ξ(0) are such that P i (ξ(0)) and V i (x(0), ξ(0)) (and thus alsoJ i (x(0), ξ(0), u 1 , . . . , u N )), i = 1, . . . , N , are bounded in the neighborhood of the origin in which the inequalities (16) are satisfied. Boundedness of the control efforts is then implied by the definition of the cost functionals in (20).

IV. SIMULATIONS
Two illustrative examples are presented in this section. In both cases the differential game corresponding to the problem associated with the agents is solved using Theorem 2 and for the collision avoidance functions (6) the parameter c = 1 has been used. In Figs. 1 and 2, the arrows indicate direction of motion and the circular markers denote the initial positions of the agents.

A. Two Agents Maneuvering Through a Narrow Path
Consider the case in which there are N = 2 agents and these are to exchange position: The initial positions of the agents are x 1 (0) = [−30, 0] and x 2 (0) = [30,0] , whereas their target positions are x * 1 = x 2 (0) and x * 2 = x 1 (0). The parameters associated with the agents are α 1 = α 2 = 1, β 1 = β 2 = 0.1, and r 1 = r 2 = 1. Their paths are blocked by two circular obstacles of radii 2, centered at (0, 4) and (0, −3.5). Note that the obstacles are such that both agents cannot pass between the two obstacles simultaneously. The remainder of the parameters have been selected as follows: γ 1 = 4, γ 2 = 0.5, and R 1 = R 2 = I, k = 0.4 and ξ(0) = [60, −5, 240, 7] . The trajectories of the first (black line) and second (gray line) agents are shown in Fig. 1, where the gray circular regions indicate the static obstacles and the dotted circles indicate the safety radius of each agent at the points along the trajectories at which the agents are closest to one another or to the obstacles. 2

V. CONCLUSION
In this paper, the problem of maneuvering a team of agents from given initial positions to predefined target positions, while avoiding both interagent collisions and collisions with static obstacles, is considered. For agents with single-integrator dynamics the problem is formulated as an infinite-horizon, nonzero-sum differential game. Obtaining feedback Nash equilibrium solutions for the differential game involves solving a system of coupled PDEs, for which closed-form solutions cannot be easily found. A systematic method of constructing approximate solutions to the problem, based on the approach developed in [39] is proposed in this paper. The theory is demonstrated on a series of illustrative examples. Future work includes considering the problem in which there is limited communication between the agents. It is also of interest to extend the results to problems in which the agents seek to achieve trajectory tracking instead of simply reaching static targets.