NUMERICAL SOLUTION OF A PURSUIT-EVASION DIFFERENTIAL GAME INVOLVING TWO SPACECRAFT IN LOW EARTH ORBIT

This paper considers a spacecraft pursuit-evasion problem taking place in low earth orbit. The problem is formulated as a zero-sum differential game in which there are two players, a pursuing spacecraft that attempts to minimize a payoff, and an evading spacecraft that attempts to maximize the same payoff. We introduce two associated optimal control problems and show that a saddle point for the differential game exists if and only if the two optimal control problems have the same optimal value. Then, on the basis of this result, we propose two computational methods for determining a saddle point solution: a semi-direct control parameterization method (SDCP method), which is based on a piecewise-constant control approximation scheme, and a hybrid method, which combines the new SDCP method with the multiple shooting method. Simulation results show that the proposed SDCP and hybrid methods are superior to the semi-direct collocation nonlinear programming method (SDCNLP method), which is widely used to solve pursuit-evasion problems in the aerospace field.


(Communicated by Song Wang)
Abstract. This paper considers a spacecraft pursuit-evasion problem taking place in low earth orbit. The problem is formulated as a zero-sum differential game in which there are two players, a pursuing spacecraft that attempts to minimize a payoff, and an evading spacecraft that attempts to maximize the same payoff. We introduce two associated optimal control problems and show that a saddle point for the differential game exists if and only if the two optimal control problems have the same optimal value. Then, on the basis of this result, we propose two computational methods for determining a saddle point solution: a semi-direct control parameterization method (SDCP method), which is based on a piecewise-constant control approximation scheme, and a hybrid method, which combines the new SDCP method with the multiple shooting method. Simulation results show that the proposed SDCP and hybrid methods are superior to the semi-direct collocation nonlinear programming method (SDCNLP method), which is widely used to solve pursuit-evasion problems in the aerospace field.
1. Introduction. Pursuit-evasion problems, in which opposing decision-makers interact within a complex dynamic environment, are challenging problems to solve. In this paper, we consider a pursuit-evasion problem involving two spacecraft acting under independent control: the pursuing spacecraft attempts to minimize a certain payoff function, while the evading spacecraft attempts to maximize the same function. This pursuit-evasion problem can be formulated as a zero-sum differential game, of the type first introduced by Issacs in 1965 [13].
Friedman [8] and Berkovitz [2,3] have derived necessary optimality conditions and established the existence of open-loop saddle point trajectories for zero-sum differential games. However, for differential games in the aerospace field, such as the spacecraft pursuit-evasion problem considered in this paper, it is usually impossible to derive analytical solutions [26,27]. Hence, numerical methods are indispensable.

SONGTAO SUN, QIUHUA ZHANG, RYAN LOXTON AND BIN LI
Two types of numerical methods have been proposed for solving spacecraft pursuit-evasion problems: the multiple shooting approach [4], which involves solving a two-point boundary value problem formed from the necessary optimality conditions of the corresponding differential game, and the semi-direct approach [11], which involves replacing the differential game with a corresponding optimal control problem. Multiple shooting methods give solutions of high accuracy [4], but their regions of convergence are usually very narrow [28]. Semi-direct methods, on the other hand, converge for a wide range of initial guesses, but they are usually not as accurate as multiple shooting methods, because the optimal control formulation of the differential game cannot be solved exactly. One of the most well-known semi-direct methods is the semi-direct collocation nonlinear programming method (SDCNLP method) proposed by Conway and his collaborators [11,12,23,24]. This method involves solving the optimal control formulation of the differential game using a collocation method. Although the SDCNLP method often works well in practice, it has two disadvantages: (i) solution accuracy is usually far less than that of the multiple shooting method; and (ii) the optimal control problem being solved is not precisely equivalent to the original differential game.
In this paper, we propose two new computational methods -a semi-direct control parameterization method (SDCP method) and a hybrid method -to overcome the disadvantages of the SDCNLP method proposed by Conway et al. Our new SDCP method involves introducing two optimal control problems corresponding to the differential game, whereas the SDCNLP method relies on just one. We give a precise proof to show that solving these optimal control problems in succession is equivalent to solving the differential game. Similar results are not available for the SDCNLP method. Moreover, instead of using the collocation method to solve the optimal control formulations of the differential game (as suggested by Conway et al.), we use the control parameterization method [16,18,19,30]. The advantage of control parameterization is that it results in a finite-dimensional approximate problem of much smaller dimension [15,17]. After introducing the SDCP method, we present a hybrid method that combines the SDCP and multiple shooting methods to improve solution accuracy.
The remainder of this paper is organized as follows. In Section 2, we formulate the spacecraft pursuit-evasion problem under consideration as a zero-sum differential game. Then, in Section 3, we give conditions for solution existence for this differential game, and also review the necessary conditions for optimality. The SDCP and hybrid methods are introduced in Section 4. Finally, in Section 5, these new methods are compared with the existing SDCNLP method via a numerical example.
2. The pursuit-evasion problem. Consider a pursuit-evasion problem involving two spacecraft (the players). Let P denote pursuing spacecraft and let E denote the evading spacecraft. Each spacecraft is governed by a dynamic system with its own control variables. The dynamic systems for players P and E (which are uncoupled) are given byẋ x where x P (t) is the state of P , x E (t) is the state of E, u P (t) is the control of P , and u E (t) is the control of E. Here, g P (t, x P (t)) and g E (t, x E (t)) are continuously NUMERICAL SOLUTION OF A PURSUIT-EVASION PROBLEM   3 differentiable vector-valued functions, and B P (t) and B E (t) are continuous matrixvalued functions depending on t. Furthermore, the controls u P (t) and u E (t) belong to given admissible strategy sets U P and U E , respectively. The initial conditions for system (1) are where t 0 is a given initial time.
We consider the following Mayer objective functional: where the terminal time t f is fixed and Φ is a given continuously differentiable function. The spacecraft pursuit-evasion problem is defined as follows.
Problem (G). Subject to Eqs. (1) and (2), choose u P ∈ U P to minimize the payoff (3), and choose u E ∈ U E to maximize the payoff (3).
The payoff function (3) can often be viewed as a measure of spacecraft separation. Thus, in Problem (G), the pursuing spacecraft wants to minimize the separation, while the evading spacecraft wants to maximize the separation.
3. Preliminary results. Problem (G) is a zero-sum differential game in which the two players have competing objectives: player P wants to minimize the payoff, while player E wants to maximize the same payoff. Because of these competing objectives, the concept of "solution" to Problem (G) needs to be carefully considered. In this section, we give existence conditions for a solution of Problem (G). The necessary optimality conditions for Problem (G) will also be reviewed.
3.1. Solution existence for Problem (G). Throughout this paper, we assume that the following condition is satisfied.
Assumption 3.1. The admissible strategy sets U P and U E are compact sets in some metric space, and J is a continuous function from U P × U E → R.
To solve Problem (G), we seek control strategies u * P ∈ U P and u * E ∈ U E such that max Any pair (u * P , u * E ) satisfying (4) is called a saddle point for Problem (G). Note that there could be multiple saddle points, and any such saddle point is a valid solution of Problem (G) [1]. Note also that if Eq. (4) holds, then for any pair of control strategies (u P , u E ) ∈ U P × U E , . To proceed further, we need the following definitions.
Definition 3.2. For Problem (G), the set of best reply strategies for player P , given a fixed strategy u E ∈ U E for player E, is defined as Similarly, the set of best reply strategies for player E, given a fixed strategy u P ∈ U P for player P , is defined as

SONGTAO SUN, QIUHUA ZHANG, RYAN LOXTON AND BIN LI
Note that S P (u E ) contains the best reply strategies for P when E uses the fixed strategy u E , and S E (u P ) contains the analogous strategies for E. Definition 3.3. For any given u E ∈ U E , let u P (u E ) ∈ S P (u E ) be a corresponding best reply strategy for player P . Similarly, for any given u P ∈ U P , let u E (u P ) ∈ S E (u P ) be a corresponding best reply strategy for player E. Then the lower value of the zero-sum game is defined as Similarly, the upper value of the zero-sum game is defined as Definition 3.4. If V + and V − are both finite, and V + = V − = V , then V is called the value of the differential game.
Before we can characterize saddle point existence in Problem (G), we need the following lemmas.
Lemma 3.5. The lower value of the zero-sum game is no greater than the upper value of the zero-sum game. Mathematically, Proof. According to the definition of V − , for each ε > 0, there exists a correspond- where u P (u ε E ) ∈ S P (u ε E ) is a best reply strategy for P corresponding to u ε E . Then, Since ε > 0 was selected arbitrarily, we immediately deduce V − ≤ V + . This completes the proof.
Lemma 3.6. Let u n P be a sequence of control strategies for player P , and let u E (u n P ) be a corresponding sequence of best reply strategies for player E. Suppose that (u n P , u E (u n P )) → (ū P ,ū E ) as n → ∞, where (ū P ,ū E ) ∈ U P × U E . Thenū E is a best reply strategy for player E corresponding toū P .
There are two cases to consider: (i) J(ū P ,ū E ) > J(ū P , u E (ū P )); and (ii) J(ū P ,ū E ) = J(ū P , u E (ū P )). But case (i) is clearly impossible, as u E (ū P ) is a best reply strategy for player E corresponding toū P . Thus, we must have This shows thatū E ∈ S E (ū P ), as required. 5 We also have the following analogue to Lemma 3.6 for a sequence of control strategies for player E.

NUMERICAL SOLUTION OF A PURSUIT-EVASION PROBLEM
Lemma 3.7. Let u n E be a sequence of control strategies for player E, and let u P (u n E ) be a corresponding sequence of best reply strategies for player P . Suppose that (u P (u n E ), u n E ) → (ū P ,ū E ) as n → ∞, where (ū P ,ū E ) ∈ U P × U E . Thenū P is a best reply strategy for player P corresponding toū E .
Proof. Similar to the proof of Lemma 3.6.
We now present the following existence condition, which characterizes the solution of Problem (G).
Proof. Suppose that a saddle point (u * P , u * E ) exists. Since we already know from where u E (u ε P ) is a best reply strategy for the evader corresponding to u ε P . Since U P and U E are compact (recall Assumption 3.1), we can choose a subsequence ε n → 0 such that the corresponding subsequence of strategy pairs (u εn P , u εn E ), and the corresponding sequence of best reply strategies (u P (u εn E ), u E (u εn P )), both have limit points. That is, there exists strategy pairs (u * . Therefore, from (5) and (6), According to Lemmas 3.6 and 3.7,ū * P is a best reply strategy for P corresponding to u * E , andū * E is a best reply strategy for E corresponding to u * P . Thus, Since V − = V + , this shows that (u * P , u * E ) is a saddle point for Problem (G).

Necessary optimality conditions for Problem (G).
In this section, we briefly review the necessary optimality conditions for Problem (G). For more details, we refer readers to [8].
Define the Hamiltonian H as where λ P and λ E are the adjoint variables for P and E, respectively. Any saddle point solution for Problem (G) must satisfy the following equations: where the adjoint variables λ P and λ E satisfy the differential equations, with the terminal conditions Note that, in view of equation (7a), the optimal strategy for the pursuer depends solely on λ P . Similarly, the optimal strategy for the evader depends solely on λ E . If the optimal strategies for players P and E can be expressed as explicit functions of λ P and λ E , then these functions can be substituted into (1) and (8) to yield a two-point boundary value problem (TPBVP) consisting of (1), (2), (8) and (9). In principle, the multiple shooting method [28] can be applied to solve this TP-BVP. However, good initial approximations of the optimal trajectories and adjoint variables are required. We refer readers to [28] for more details on the multiple shooting method. In this paper, we propose two new computational methods for solving Problem (G), the details of which are discussed in the next section.
4. Numerical solution methods. In this section, we propose two numerical methods for solving Problem (G). In contrast with the traditional semi-direct methods in the literature (see, for example, the work by Horie [11]), our new methods involve solving two optimal control problems instead of one. The advantage of our new approach is that it is guaranteed to yield a saddle point solution of Problem (G) (assuming one exists), whereas the optimal control formulation in the traditional semi-direct approach is not precisely equivalent to the original differential game.

4.1.
Optimal control formulations. We assume that the maximization condition (7b) can be used to express the optimal strategy for player E as a unique function of the adjoint variable λ E (t): Substituting (10) into (1), and appending the adjoint dynamics for λ E , yields the following extended system: with the initial conditions and the terminal conditions Note that u P completely determines the state trajectory for x P via (11a) and (12a). By substituting the resulting value of x P (t f ) into (13), we obtain a corresponding two-point boundary value problem for x E and λ E consisting of (11b), (11c), (12b) and (13). We now define the following optimal control problem.
Problem (O P ). Subject to the extended system (11)-(13), choose u P ∈ U P to minimize the payoff J.
In Problem (O P ), λ E is viewed as an additional state variable subject to the terminal state constraint (13). The following result reveals the relationship between Problem (O P ) and Problem (G). Proof. Suppose player P uses the fixed strategy u P ∈ U P . Then player E's goal is to solve the following optimal control problem to determine the corresponding best reply strategy: x E (t 0 ) = x 0 E , where x P is the solution of (11a) and (12a) corresponding to the fixed strategy u P . The optimal control for this problem satisfies, Sinceū * E is the unique control that satisfies these conditions, it must be the best reply strategy corresponding to u P . Thus, the optimal value for Problem (O P ) is as required. This completes the proof.
We now introduce the analogous optimal control problem for player E. To do this, we assume that the minimization condition (7a) can be used to derive the optimal strategy function for player P as a unique function of the adjoint variable λ P : Substituting (14) into (1) and appending the adjoint dynamics for λ P yields the following extended system: with the boundary conditions The following optimal control problem is the analogue of Problem (O P ) for player E.
Problem (O E ). Subject to the extended system (15)- (16), choose u E ∈ U E to maximize the payoff J.
We have the following result, which links Problem (O E ) with Problem (G).  We now present the main result for this section. Letū * E be the best reply strategy for player E corresponding to u * P , and letū * P be the best reply strategy for player P corresponding to u * E . According to Propositions 4.1 and 4.2, Combining (17) and (18) yields This shows that (u * P , u * E ) is a saddle point for Problem (G).
On the basis of Theorem 4.4, Problem (G) can be solved using the following procedure: 1. Solve Problem (O P ) to obtain an optimal control u * P . To implement this procedure successfully, we require a numerical method for solving Problems (O P ) and (O E ). This is the focus of the next section.

4.2.
Semi-direct control parameterization method. Both Problems (O P ) and (O E ) are optimal control problems in canonical form. Since these problems are similar, we will restrict our attention to Problem (O P ) in this section.
To solve Problem (O P ) numerically, we will apply the control parameterization technique [19,29]. This technique involves partitioning the time horizon [ t 0 , t f ] into a set of subintervals [τ j−1 , τ j ), where τ j , j = 1, . . . , q, are fixed knot points such that t 0 = τ 0 < τ 1 < · · · < τ q = t f . Then, the control function u P is approximated by a piecewise-constant function as follows: where χ [ τj−1,τj ) (t) is the characteristic function defined by 0, otherwise. Substituting (19) into (11) giveṡ Since the initial value of λ E is unknown, we introduce a decision vector ζ such that λ E (t 0 ) = ζ. Let E denote the set consisting of all suitable vectors for ζ. Then the initial conditions for (20) are Then, the terminal condition (13) can be expressed as the following terminal state constraint: where · 2 denotes the Euclidean norm. Furthermore, let Ξ denote the set of all vectors in the form of (22) such that the corresponding approximate control defined by (19) belongs to U P .
In view of the dynamic system (20)-(21), the payoff can be expressed as a function of σ and ζ: We now define the following finite-dimensional approximation of Problem (O P ).
Problem (O q P ) can be solved as a nonlinear programming problem using standard constrained optimization methods, such as sequential quadratic programming (SQP). To do this, we need the gradient formulas for the payoff J q and the constraint function G q . These gradient formulas are given in the following theorems, which can be proved in a similar manner to Theorem 5.2.1 in [29].
where the adjoint variable η = [η T P , η T E , η T λ ] T satisfies the differential equationṡ with the terminal conditions Theorem 4.6. The gradients of the constraint function G q (σ, ζ) with respect to σ and ζ are given by where the adjoint variable µ = [µ T P , µ T E , µ T λ ] T satisfies the differential equationṡ with the terminal conditions The gradient formulas in Theorems 4.5 and 4.6 can be incorporated into gradientbased optimization techniques such as SQP to solve Problem (O q P ) numerically. When q is large, the optimal piecewise-constant control is a good approximation of the optimal control for Problem (O P ). In fact, it can be shown that, if u q, * P is the See [20,21,29] for proofs of this result under various conditions. Gradient-based optimization methods can only guarantee local optimal solutions. Thus, the initial point used to start the optimization process is crucial. In the numerical simulations in Section 5.2, we use the multi-objective genetic algorithm (MOGA) in [7] to determine a good starting point.

Hybrid method.
As with any semi-direct method, the key limitation of the SDCP method is that the optimal control formulations of the differential game can, in general, only be solved numerically. Indeed, although the control parameterization method generates approximate solutions that converge to the true solution as the number of subintervals approaches infinity, the limiting solution is usually never achievable in practice. Accordingly, we now propose a hybrid method that combines the SDCP method with the multiple shooting method to improve solution accuracy. In this hybrid method, the SDCP method is first used as pre-processor to generate an approximate saddle point. This approximate saddle point is then used as the initial guess for the multiple shooting method. By using this hybrid strategy, the strong convergence properties of the semi-direct method and the high accuracy of the multiple shooting method can both be achieved. The hybrid algorithm is summarized below.
1. Solve Problem (O P ) numerically using the control parameterization method described in Section 4.2. Let u * P denote the approximate optimal control of Problem (O P ) and let x * E and λ * E denote the corresponding trajectories for player E and the adjoint variable for player E, respectively. 2. Solve Problem (O E ) numerically using the control parameterization method described in Section 4.2. Let u * E denote the approximate optimal control of Problem (O E ) and let x * P and λ * P denote the corresponding trajectories for player P and the adjoint variable for player P , respectively. 3. If the difference between the optimal payoff values of Problems (O P ) and (O E ) exceeds a given tolerance, then stop: no saddle point exists. Otherwise, take (u * P , u * E ) as an approximate saddle point. 4. Starting with the initial trajectories x * P , x * E , λ * P and λ * E , use the multiple shooting method to solve the TPBVP described by Eqs. (1), (2), (8) and (9). 5. Numerical example. In this section, we give a specific model for the spacecraft pursuit-evasion problem in low earth orbit. We then solve this problem using three methods: the new SDCP and hybrid methods presented in Section 4, and the existing SDCNLP method in reference [24]. 5.1. Mathematical model. We consider a pursuit-evasion scenario involving two spacecraft distributed in low earth orbit (altitude of between 160 km and 2000 km).
The coordinate system for the two spacecraft is shown in Fig. 1. The motion of the ith spacecraft (i = P for the pursuer and i = E for the evader) is described by the following differential equations: where s x i , s y i and s z i define the position of the ith spacecraft with respect to the three coordinate axes, and u x i , u y i and u z i are control variables representing the thrust directions along the three coordinate axes. Eqs. (25) are known as the Clohessy-Wiltshire equations [5]. In these equations, T i is the thrust-to-mass ratio (constant) for the ith spacecraft, ω(t) is the angular velocity of the origin O, µ is the earth planetary constant, and r(t) is the distance from the center of the earth to the origin of the coordinate system.
The control variables for player i are subject to the following constraints: Let v x i , v y i and v z i define the velocity of the ith spacecraft with respect to the three coordinate axes. Then, the differential equations (25) can be written as the following first-order dynamic system: The payoff function in this example measures the separation distance between the two spacecraft at the terminal time: The spacecraft pursuit-evasion problem can be stated as follows: subject to the dynamic system (26) with i = P , the pursuing spacecraft wants to choose its control variables u x P , u y P and u z P to minimize the separation distance (27) at the terminal time, and subject to the dynamic system (26) with i = E, the evading spacecraft wants to choose its control variables u x E , u y E and u z E to maximize the separation distance (27) at the terminal time.
Using the minimization condition (7a), the optimal control functions for player P are obtained by solving the following optimization problem: where λ x P , λ y P and λ z P are the adjoint variables corresponding to v x P , v y P and v z P , respectively. Solving this optimization problem yields the optimal controls for player P in terms of the adjoint variables: Similarly, the optimal controls for player E are Using the formulas given above for the optimal control functions for players P and E, we can readily obtain the optimal control problems corresponding to the differential game (i.e., Problems (O P ) and (O E )).

5.2.
Numerical results. According to Eq. (25), the in-plane motion (i.e., the x-y subsystem) and the out-of-plane motion (i.e., the z subsystem) are independent and can be considered separately. Thus, we assume that the pursuit-evasion problem occurs in a co-planar orbit and r(t) = r 0 , where r 0 is the sum of the radius of the earth (6371 km) and the altitude of the origin O. We also assume that the angular velocity ω(t) is constant and equals µ/r 3 0 , where µ = 3.986 × 10 5 km 3 is the earth planetary constant.
We consider eight versions of the pursuit-evasion problem. These versions differ in the initial values of the x-y state variables, the altitude of the origin, and the values of T P and T E . Each version of the problem has a fixed terminal time of t f = 500 seconds.
In versions 1-4 of the pursuit-evasion problem, the altitude of the origin is 500 km; in versions 5-8, the altitude of the origin is 1000 km. The initial values of the x-y state variables for each version are given in Table 1. Moreover, Table 2 gives the Table 1. Initial values of the x-y state variables  values of T P and T E , where the constant g = 9.8 × 10 −3 km/s 2 is the base unit for the thrust-to-mass ratios.
Details of our numerical implementation are given as follows. For the SDCP method, we use the FORTRAN software package MISER 3.2 [14], which is based on the nonlinear programming code NLPQL by Schittkowski [25], to implement the control parameterization procedure with q = 10 subintervals. The multi-objective genetic algorithm code by Deb et al. [7] is used to determine a good starting point for MISER 3.2. For the hybrid method, we use the multiple shooting code BND-SCO [22], with the solution obtained by MISER 3.2 as the starting point. Finally, for the SDCNLP method, we use the Gauss-Lobatto quadrature collocation method and solve the resulting nonlinear programming problem using the FORTRAN code NPSOL [9].
The computer used for our numerical experiments is a server with two 2.27GHz Xeon CPUs and 24 GB memory. For both the SDCP and the SDCNLP methods, we set the constraint accuracy and the optimization tolerance to 10 −9 . Fig. 2-5 show the optimal controls and optimal state trajectories for Problem (O P ) corresponding to versions 1 and 2 of the pursuit-evasion problem. Since the optimal controls and optimal state trajectories for the other versions are similar, they are not shown here.
Optimal control for E In Table 3, we compare the accuracy of the SDCP, hybrid and SDCNLP methods. The errors for the SDCP and SDCNLP methods are measured by the violation of constraint (9); the error for the multiple shooting method is the sum of the constraint errors at each shooting point. The results in Table 3 indicate that, as expected, the hybrid method is considerably more accurate than both the SDCP and the SDCNLP methods.
In Tables 4 and 5, we compare the efficiency of the three methods in terms of CPU time. The results in Table 4 show that the SDCP method requires much less computation time than the SDCNLP method with the same level of precision. When the constraint tolerance is reduced to 10 −16 to match the accuracy of the hybrid method, the SDCNLP method with ten subintervals (and five Gauss-Lobatto quadrature points per subinterval) does not converge. Moreover, we observed that even when the number of subintervals increases, the results from the SDCNLP method are still not as accurate as those from the hybrid method. This is due to finite precision arithmetic [6]; an accumulation of rounding errors begins to dominate solution accuracy as the number of collocation points increases [10]. Table 5 shows the corresponding CPU times for the hybrid method. The stars in the SDCNLP column indicate that when the constraint tolerance is 10 −16 , the SDCNLP method did not converge for any of the eight versions of the pursuit-evasion problem.  6. Conclusions. This paper introduces two computational methods for generating numerical solutions to a class of spacecraft pursuit-evasion differential games. The first method, called the semi-direct control parameterization method (SDCP  , is based on two optimal control formulations of the differential game.
Our main result shows that by solving these optimal control problems, we can determine whether the differential game has a saddle point solution. The second method we presented is a hybrid method that combines the SDCP method with the  multiple shooting method to capture the best qualities of each method-namely, the strong convergence properties of the SDCP method and the accuracy of the multiple shooting method. Our numerical results in Section 5.2 indicate that, for the class of spacecraft pursuit-evasion problems considered in this paper, our new methods outperform the existing SDCNLP method.