Maneuver Planning for Multiple Pursuit Intelligent Surface Vehicles in a Sequence of Zero-Sum Pursuit–Evasion Games

: Unmanned surface pursuit is a complex and challenging engineering problem, especially when conducted by multiple intelligent surface vehicles (ISVs). To enhance the pursuit performance and facilitate strategic interaction during the target pursuit, this paper proposes a novel game theory-based maneuver planning method for pursuit ISVs. Firstly, a specific two-player zero-sum pursuit–evasion game (ZSPEG)-based target-pursuit model is formed. To ensure the vehicles reach a quick consensus, a target-guided relay-pursuit mechanism and the corresponding pursuit payoffs are designed. Meanwhile, under the fictitious play framework, the behavioral pattern and the strategies of the target could be fictitiously learned. Furthermore, mixed-strategy Nash equilibrium (MNE) is employed to determine the motions for the vehicles, the value of which is the best response in the proposed ZSPEG model. Finally, simulations verify the effectiveness of the above methods in multi-ISV surface pursuit.


Introduction
From conceptualization to realization, the utilization of multiple intelligent surface vehicles (ISVs) to conduct surface pursuits has attracted widespread attention [1][2][3][4][5].However, how to effectively and efficiently pursue a moving target still poses a significant challenge for a multi-ISV system.The system necessitates numerous advanced technologies, including target recognition, target positioning, and dynamic maneuver planning [6,7].Focusing on providing an efficient maneuver planner for the multi-ISV pursuit system, this paper aims to address the concerns regarding realizing the strategic interaction and addressing the data scarcity of the target.
There are three mainstream methods in maneuver planning for a multi-vehicle target pursuit: graph theory-based algorithms, the cooperative target hunting approach, and game theory-based methods.Based on graph theory, the prevailing approach is the Voronoi-related algorithm, which could visually represent the relationships among the motion sets for the vehicles [8,9].However, all the graph theory-based algorithms do not directly provide information regarding strategic choices and the interaction between the pursuit vehicles and the target.As for the cooperative target hunting methods, researchers have focused on the formation control and dynamic task allocation during the planning process [10][11][12].Nevertheless, these methods require detailed models for both the vehicles and the environmental information.Game theory, as a mature mathematical theory, has broad applicability and theoretical rigor in maneuver planning [13].It offers advantages such as equilibrium point analysis, strategy optimization, and the handling of incomplete information.To realize intelligent and cognitive interactions between the pursuit team and the target, this paper focuses on developing a novel and practical game theory-based maneuvering planning method.
Under the game theory-based target-pursuit framework, three types of game models, Stackelberg games [14], cooperative games [15], and zero-sum games (ZSGs) [16][17][18][19], demonstrate different characteristics.The Stackelberg game involves a leader-follower model, where the leader takes action first and the follower adjusts their strategy after observation [20].Hence, the Stackelberg game-based model is suitable for simulating the dynamics in a leader-follower structure.However, the complexity of the Stackelberg game is high, requiring sophisticated real-time decision-making.Cooperative games emphasize teamwork [21].In cooperative games, team members depend on each other.The action of one member may affect the entire team [22].The interdependence of the members may result in a decrease in team effectiveness when one team member fails to fulfill their responsibilities.Compared with the two previously mentioned games, ZSGs are more applicable to intense competition [23].In ZSGs, the success of one team inevitably leads to the failure of other teams.ZSGs enable participants to adopt clear confrontational strategies.Moreover, due to the relative simplicity of the ZSG models, decision-making is more explicit and easier to implement, contributing to more effective target pursuit in competitive environments.
The application of ZSGs to target pursuit manifests as zero-sum pursuit-evasion games (ZSPEG).The ZSPEG-based model comprises two players: the pursuer and the evader [16][17][18][19].In refining the ZSPEG-based model to be more specific and tailored, researchers focus on enriching its objective function and considering various involved factors.A divide-and-conquer approach is used for a multi-player ZSPEG where the pursuers have a twofold goal [11].To identify unknown sets of incoming attackers, a multi-model adaptive estimator is implemented in the offline design policy sets [24].Furthermore, a bilateral adaptive parameter estimation method is adopted to deal with a multi-agent ZSPEG model with unknown general quadratic goals [25].In recent years, various algorithms have been developed to solve ZSG-based models, such as regret matching [26], fictitious play [27], double oracle [28], etc.Among them, the most popular algorithms are regret learning-based methods.They rely on the concepts of external regret, internal regret, swap regret, and Nash equilibrium-based regret [29].Building on this foundation, the current mainstream algorithms are the optimistic follow-the-regularized-leader [30] and the optimistic mirror descent algorithm [31].Despite the extensive research on ZSPEG-based methods and the relevant algorithms, there is little research on applying the ZSPEG to multi-ISV target pursuit.The main reason for this is the high complexity, data scarcity, and many technical challenges in multi-ISV target pursuit.In this context, comparing other algorithms to the ZSG-based models, the fictitious play algorithm can provide more effective assistance.Firstly, fictitious play could simulate the behavior of the target, which could help the pursuit system to better understand the behavioral patterns of the target, thereby addressing the data scarcity of the target.Additionally, the fictitious play algorithm allows for the experiment to be validated and for an easier evaluation of the feasibility of different strategies and algorithms.This characteristic could prove advantageous for multi-ISV systems by mitigating the complexity during the surface pursuit.Thus, fictitious play is adopted as the solution framework for our developed ZSPEG-based, multi-ISV, target-pursuit model.
Building upon the above discussions, this paper formulates an innovative ZSPEGbased maneuver planning method for multiple-pursuit ISVs.The main contributions are summarized as follows: i By employing the ZSPEG framework, the surface pursuit system gains the ability to explore and analyze the intense competition between the pursuit team and the target.ii Through the utilization of fictitious play, the data scarcity concerning the target can be alleviated, thereby reducing the complexity associated with surface pursuit.
iii Under the fictitious play framework, a mixed-strategy Nash equilibrium (MNE)-based decision-making process is employed.This approach could derive the best responses to maneuver the surface vehicles effectively and stably.
The subsequent sections of this paper follow a structured organization.Section 2 provides the details of the multi-ISV target-pursuit model, which is based on the ZSPEG framework.Following this, Section 3 introduces the design of a motion planner for the pursuit vehicles, utilizing the MNE methodology.Section 4 shows the execution of the simulations.Finally, Section 5 presents the conclusions drawn in the study.

System Model
ZSG means that, under strict competition conditions, one player gains benefits while another player suffers loss.The sum of their benefits is zero [32].The development of a ZSPEG model for a multi-ISV pursuit system includes three foundational elements in the ZSG-based model: the two players, their strategy sets, and the payoff function.

Game Scenario and the Players
A classic ZSG model can be formulated as follows: where S 1 denotes the strategy set for player 1.It is assumed that is the positive integer.Similarly, S 2 denotes the strategy set for player 2, which is assumed to be F represents the payoff matrix for player 1.As for the state in the ZSG, when i ∈ {1, for player 1 is obtained.Different rows in F denote the different strategies of player 1, while different columns in F denote the different strategies chosen by player 2.
The pursuit scenario with n ISVs pursuing a mobile target is presented in Figure 1.In accordance with this scenario, the constructed ZSSG model includes two players: the pursuit team, consisting of n vehicles, and the moving target.Then, the ZSSG-based model for the multi-ISV target-pursuit scenario could be formulated by G = S p , S e , F , where S p and S e denote the strategy sets for the two players.It should be noted that the red star in the figure represents the attached points of the moving target.Once the target reaches this point, the pursuit team fails the protection task.F represents the payoff matrix for the pursuit team.ii Through the utilization of fictitious play, the data scarcity concerning the target can be alleviated, thereby reducing the complexity associated with surface pursuit.iii Under the fictitious play framework, a mixed-strategy Nash equilibrium (MNE)based decision-making process is employed.This approach could derive the best responses to maneuver the surface vehicles effectively and stably.
The subsequent sections of this paper follow a structured organization.Section 2 provides the details of the multi-ISV target-pursuit model, which is based on the ZSPEG framework.Following this, Section 3 introduces the design of a motion planner for the pursuit vehicles, utilizing the MNE methodology.Section 4 shows the execution of the simulations.Finally, Section 5 presents the conclusions drawn in the study.

System Model
ZSG means that, under strict competition conditions, one player gains benefits while another player suffers loss.The sum of their benefits is zero [32].The development of a ZSPEG model for a multi-ISV pursuit system includes three foundational elements in the ZSG-based model: the two players, their strategy sets, and the payoff function.

Game Scenario and the Players
A classic ZSG model can be formulated as follows: where  denotes the strategy set for player The pursuit scenario with  ISVs pursuing a mobile target is presented in Figure 1.In accordance with this scenario, the constructed ZSSG model includes two players: the pursuit team, consisting of  vehicles, and the moving target.Then, the ZSSG-based model for the multi-ISV target-pursuit scenario could be formulated by  =  ,  ,  , where  and  denote the strategy sets for the two players.It should be noted that the red star in the figure represents the attached points of the moving target.Once the target reaches this point, the pursuit team fails the protection task. represents the payoff matrix for the pursuit team.

Strategy Set
Before designing the maneuver strategy sets, the kinematic model of the surface vehicle should be formulated.When the vehicle moves on the surface, three primary types of motion on the horizontal plane are typically identified: surge, sway, and yaw.The corresponding motion analysis is presented in Figure 2. The motions of the vehicle are described using two reference coordinates: the geodetic reference coordinate ∑ I and the vehicle body reference coordinate ∑ B [33].As shown in Figure 2, when i ∈ {1, 2, • • • , n}, under ∑ I , the position of the ith vehicle is x i = [x i , y i ] T .u i is the speed at which the vehicle oscillates left and right along Y B , which is usually set to be zero.v i is the speed at which the boat oscillates forward and backward along X B .ϕ i is the yaw angle of the vehicle.v c is the velocity of the currents at ∑ I .

Strategy Set
Before designing the maneuver strategy sets, the kinematic model of the surface vehicle should be formulated.When the vehicle moves on the surface, three primary types of motion on the horizontal plane are typically identified: surge, sway, and yaw.The corresponding motion analysis is presented in Figure 2. The motions of the vehicle are described using two reference coordinates: the geodetic reference coordinate ∑ and the vehicle body reference coordinate ∑ [33].As shown in Figure 2, when  ∈ {1,2, ⋯ , }, under ∑ , the position of the th vehicle is   =  ,  . is the speed at which the vehicle oscillates left and right along  , which is usually set to be zero. is the speed at which the boat oscillates forward and backward along  .ϕ is the yaw angle of the vehicle.  is the velocity of the currents at ∑ .Under Assumption 1 and the discrete maneuver, the three-degrees-of-freedom kinematics model of the th vehicle is shown as follows: where the control variable is the unit vector of yaw angle    at the current th step, where  is a non-negative integer.   + 1 represents the position of the vehicle at the next time step.   represents the position at the current th step.∆ is the unit time period.     is a standard vector-scalar multiplication, where scalar  scales the components of vector   .This results in a new vector ( cos ϕ ,  sin  ), which represents the velocity of the vehicle in Cartesian coordinates.Based on the above introduction,   () is what should be determined for the pursuit vehicle in the ZSSG-based model.Then, if there are  yaw vectors for the th vehicle to choose, there would be  choices in the  for the pursuit team.Therefore, to reduce the computational overhead, one decomposing method is adopted in this paper.The decomposing method that is employed is named the target-guided relay-pursuit (TGRP) strategy, which is elaborated in Definition 1.This strategy hinges on the principle of relay pursuit, where only one vehicle is active at any given time, while the others remain on standby, ready to take over the pursuit as the scenario evolves.There are two reasons for adopting the TGRP strategy.The core reason behind using TGRP is to dynamically assign the pursuit role to the ISV that is best positioned to continue the chase, minimizing the overall time to intercept and enhancing the strategic positioning of the pursuit team.Furthermore, if fewer ISVs are actively maneuvering at any one time, the risk of collisions is significantly decreased, enhancing safety and operational integrity.However, while the Under Assumption 1 and the discrete maneuver, the three-degrees-of-freedom kinematics model of the ith vehicle is shown as follows: where the control variable is the unit vector of yaw angle r i (k) at the current kth step, where k is a non-negative integer.x i (k + 1) represents the position of the vehicle at the next time step.x i (k) represents the position at the current kth step.∆t is the unit time period.v i (k)r i (k) is a standard vector-scalar multiplication, where scalar v i scales the components of vector r i .This results in a new vector (v i cos ϕ i , v i sin ϕ i ) , which represents the velocity of the vehicle in Cartesian coordinates.Based on the above introduction, r i (k) is what should be determined for the pursuit vehicle in the ZSSG-based model.Then, if there are N yaw vectors for the ith vehicle to choose, there would be N n choices in the S p for the pursuit team.Therefore, to reduce the computational overhead, one decomposing method is adopted in this paper.The decomposing method that is employed is named the target-guided relay-pursuit (TGRP) strategy, which is elaborated in Definition 1.This strategy hinges on the principle of relay pursuit, where only one vehicle is active at any given time, while the others remain on standby, ready to take over the pursuit as the scenario evolves.There are two reasons for adopting the TGRP strategy.The core reason behind using TGRP is to dynamically assign the pursuit role to the ISV that is best positioned to continue the chase, minimizing the overall time to intercept and enhancing the strategic positioning of the pursuit team.Furthermore, if fewer ISVs are actively maneuvering at any one time, the risk of collisions is significantly decreased, enhancing safety and operational integrity.However, while the TGRP strategy is designed to be highly effective under a wide range of conditions, it is necessary to clarify that it may not always represent the absolute optimal strategy.Its performance can be considered close to optimal in scenarios characterized by complex dynamics and the need for rapid tactical shifts.
Definition 1 (TGRP strategy [34]).In the TGRP strategy, only one vehicle is active, while the others are stationary.This relay-pursuit mechanism could effectively avoid collisions among the vehicles.Furthermore, the distribution of the active vehicle changes over time, which depends on the outcome of the game at each time step.If the active vehicle is determined by index i * , the heading vector r i for the ith pursuit ISV could be obtained by Equation (3).
where a i := x t − x i is the relative position vector from the target to the ith vehicle.0 ∥ a i denotes the heading angle for the inactive vehicle towards the direction of the target.
Under the TGRP strategy, the pursuit team has n total motions.When i ∈ {1, . . . ,n}, the ith motion denotes that the ith vehicle actively pursues the target.By utilizing the fictitious play, the restricted strategy set S t for the target is assumed to include n + 1 choices.The former n actions represent an evasion of the corresponding pursuers.The n + 1 action means that the target is moving towards its target node x n .Then, when j ∈ {1, . . . ,n + 1}, r t j denotes the jth maneuver strategy for the moving target.The payoff matrix F for the pursuit team is demonstrated in Table 1.
Table 1.Formulation of the payoff matrix.

Strategy of the Invading Vehicle: S e
Evading P 1 Evading P j Evading P n Towards x a

Strategy of the pursuit ISV: S p
Active

Pursuit Payoff Function
In this paper, two goals are set for each pursuit vehicle.The first goal is to capture the target as quickly as possible.The second is to prevent the target from reaching its intended node.The two goals are the minimum time required for the vehicle to capture the target, and the alignment of the target with the point of attack from its current position, respectively.

Minimum Time-to-Capture
To calculate the minimum time-to-capture, several parameters need to be introduced.x t , v t , and r t represent the location, the surge velocity, and the unit yaw vector for the target, respectively.l is the length of the vehicle.
Then, a time metric is introduced [23].It is associated with the minimum positive solution, t x t , x i , r t j , r i , in the following equation: where a i := x t − x i .As the time-of-capture (T C ) ij is bounded by the T U , the final (T C ) ij can be obtained as follows: (T C ) i,j = min t x t , x i , r t j , r i , T U . (5)

Avoidance of Being Attacked
There is a goal to avoid the target reaching the point of attack: x a , (A N ) :,j denotes the extent to which target is heading towards x a from its current location.Therefore, (A N ) :,j is assumed to be cos θ, where θ is the angle between r t j and vectors x a − x t .Therefore, (A N ) :,j is formed as follows: where (A N ) :,j is the alignment metric for the target with respect to its attack node under the jth strategy, and cos represents the cosine of the angle between the target's heading vector r t j and the vector pointing from the target to the attack node x a − x t .This measures how well-aligned the target's heading direction is with the vector pointing towards the attack node.According to Equation ( 6), a higher value of (A N ) :,j (close to 1) indicates that the target is moving directly towards the attack node, suggesting a higher likelihood of an attack.A lower value (close to −1) suggests the target is moving directly away from the attack node.Values of around 0 indicate perpendicular movement, implying that the target is neither approaching nor directly retreating from the attack node.This metric helps to evaluate the level of threat posed by the target to the attack node.By continuously monitoring (A N ) :,j , the pursuit system can prioritize targets that are better aligned with critical points, thus optimizing the pursuit strategies.

Goal/Payoff Function
To balance multiple critical aspects of the pursuit dynamics, the goal/payoff function ( 7) is formed based on the two introduced metrics: the minimum time-to-capture T C and the avoidance of being attacked A N .This goal function could reflect both the urgency of intercepting the target and the strategic necessity of managing risks.
where the minimum time-to-capture (T C ) i,j is normalized with the maximum value in vector T C .This ensures that all values of (T C ) i,j max i,j T C are uniformed between zero and one so that their size is similar to the values of (A N ) :,j .
There are two main motivations for adopting this goal/payoff function.First, this function facilitates a more holistic evaluation of pursuit strategies by not only focusing on quick interception but also ensuring that the maneuvers are strategically sound and sustainable over the long term.Second, the inclusion of both T C and A N allows the function to adapt to different tactical situations, providing a flexible tool that can adjust to the target's behavior and environmental variables.Based on function (7), the payoff assignment is presented in the Algorithm 1, which formulates the payoff matrix under the pursuit station (x t , v t , x a , x i , v i ).The calculated payoff matrix would assist in the latter solution to the proposed ZSPEG-based model.

Methodology
The solution to the proposed ZSPEG-based model is to calculate the accurate and stable equilibrium point.Two types of Nash equilibrium exist in the ZSG-based models: pure-strategy Nash equilibrium and MNE.

Pure-Strategy Nash Equilibrium
A pure-strategy Nash equilibrium exists when each player selects a deterministic strategy.To obtain this equilibrium, players do not employ probability distributions.Based on the definition of the Nash equilibrium in [35], the pure-strategy Nash equilibrium for the proposed ZSPEG-based model G := S p , S t , F could be explained by Definition 2.

Definition 2 (Pure-strategy Nash equilibrium [36]
).Firstly, it is assumed that the pursuit team selects the i * th vehicle and the target chooses motion r t j * , which satisfies the following conditions: where F ij = f i under the condition that r t = r t j .Then,(i * , j * ) would be a pure-strategy Nash equilibrium if, and only if, Equation ( 10) is satisfied.
Corollary 1.There exists a situation where there is no pure-strategy equilibrium value in the pursuit model.
For the best i ′ on the right-hand side of Equation ( 11), the corresponding best min Given the potential absence of a pure-strategy Nash equilibrium, MNE would be calculated for the proposed model based on the minimax theorem.MNE denotes a scenario in a game where players employ probability distributions to select various pure strategies.The minimax theorem, proposed by Jon Von Neumann, provides a solution that incorporates strategic randomness [37].
Assume that the ISVs adopt the mixed strategy, p = [p 1 , • • • , p n ] T , to choose their motion, where ∀i ∈ {1, • • • , n}, 0 ≤ p i ≤ 1, and the sum of p is equal to 1. Similarly, the target is assumed to adopt the mixed strategy: q = (q 1 , • • • , q n+1 ).The constraints for q are the same as that for p.Therefore, under mixed strategies, the payoff for the pursuit team can be calculated by Equation (14).
The goal of the pursuit ISVs is to maximize its reward, while the aim of the target is to minimize the reward of the pursuit team.Hence, their payoff functions are expressed in the following two equations, respectively.
After formulating the two payoffs, the existence of MNE in the proposed model is given and explained by Theorem 1.

Theorem 1 (existence of MNE [38]
).For any two-player ZSG models, the following conditions could be satisfied: , where V * is called the minimax value of the game.• The Nash equilibrium in this ZSG model is (p * , q * ).
Proof of Theorem 1. Suppose that ∼ p, ∼ q is an MNE in this model.Then, the payoff for the ISV is obtained by Equation (17).
As both the sum of ∼ p and ∼ q are equal to one, according to the above equations, Equation ( 18) could be obtained.
Based on Corollary 1, it could be inferred that min Therefore, the value of V∼ q is equal to V∼ p .Equation ( 19) is subsequently formulated.
It could be concluded that when ∼ p, ∼ q is an MNE, ∼ p and ∼ q must be the optimal strategies for the pursuit team and the target, respectively.□ Definition 3 (best response strategy [39]).Suppose that p −* is the mixed strategy set other than p * ; then, the best response strategy for p * is formulated by Equation (20).
• The equation must be constructed from the expected reward u.
• BR p * takes the mixed strategy in the p −* as the independent variable.Correspondence: △ −* → △ .• All the mixed strategies in theBR p * have no differences.Corollary 2. p * is the best response to q * , and vice versa.(19), the following equations could be formulated:

Proof of Corollary 2. Based on Equation
V q * = max s P (Fq).
The expected reward in Definition 3 for p * and the q * can be calculated by Equations ( 23) and (24), respectively.
The best response for p * and q * is denoted by Equations ( 25) and (26).
max u q * , q −* = min p T Fq * , (28) where the solution to the minp * T Fq is q = q * , and the solution to the maxp T Fq * is p = p * .Finally, Equations ( 29) and ( 30) can be formulated.
Based on the above equations and proofs, q * is the best response to p * , and vice versa.□

Computing Pursuit Motions
As MNE proved to be the best response in the proposed model, the solution to the max s p (Fq) or min s t p T F can provide the optimal and stable motions for the vehicles.The calculation process is as follows.First of all, p T Fq is set as V.Then, as the values in q are summed to be 1, the minimum value of p T Fq is equal to the minimum value in the vector of p * T F .Therefore, solving the max Fq problem is equivalent to solving the following linear programming problem (31): With the assistance of Algorithm 1 in generating the payoff matrix F, the MNE-based maneuver planning algorithm is shown in Algorithm 2. Based on the MNE framework derived from game-theoretic principles, Algorithm 2 aims to dynamically adjust the strategies of multiple pursuit agents based on real-time feedback from the operational environment and the target's maneuvers.The algorithm calculates the Nash equilibriums for different strategic setups, allowing each pursuit vehicle to select the best possible action that maximizes the collective chances of success.
Algorithm 2: MNE-based maneuver planner for pursuit vehicles.

Step-by-Step Process
Based on the above introduction, the flowchart illustrating the different phases of the algorithm in Section 3 is provided in Figure 3.

Simulations
In this section, we present the results obtained from the simulations and discuss their implications.The objective is to evaluate the performance of the proposed ZSPEG-based model and its effectiveness in achieving stable and accurate equilibrium points.We will analyze the outcomes of different scenarios and compare them with baseline methods to demonstrate the advantages of our method.The results will be examined in terms of timeto-capture, maneuver efficiency, etc.
To set the parameters before the simulations, a series of parameters were chosen based on their relevance to the performance and functionality of the ISV during pursuit tasks.The chosen parameters are presented in Table 2.The selection criteria for these parameters were as follows: (i) Sway velocity of the vehicle ()-the sway velocity range of 5 m/s to 15 m/s was determined by referring to the currently used ISVs, ensuring that the vehicle could adapt to the various operational speeds required for different scenarios.(ii) Sway velocity of the target ( )-a fixed value of 10 m/s was chosen based on the average target speeds observed in similar pursuit and evasion scenarios.This value balances the difficulty for the ISV to capture the target while allowing for a meaningful analysis of the algorithm's effectiveness.(iii) Length of the vehicle ()-a standard length of 4 m was selected to represent the typical dimensions of ISVs used in practical applications, ensuring the results are applicable to real-world scenarios.(iv) Intercepted distance to target ( )the value of 12 m was derived based on the length of the vehicle, indicating the optimal interception distance for successful captures while minimizing the risk of overshooting or collision.(v) Attack distance of target ( )-an attack distance of 6 m was chosen based on the ISV's common response capabilities, ensuring effective engagement.(vi) Safe distance between the ISVs ( )-a safe distance of 9 m was determined to prevent collisions between multiple ISVs operating within the same area, allowing for coordinated maneuvers.

Simulations
In this section, we present the results obtained from the simulations and discuss their implications.The objective is to evaluate the performance of the proposed ZSPEG-based model and its effectiveness in achieving stable and accurate equilibrium points.We will analyze the outcomes of different scenarios and compare them with baseline methods to demonstrate the advantages of our method.The results will be examined in terms of time-to-capture, maneuver efficiency, etc.
To set the parameters before the simulations, a series of parameters were chosen based on their relevance to the performance and functionality of the ISV during pursuit tasks.The chosen parameters are presented in Table 2.The selection criteria for these parameters were as follows: (i) Sway velocity of the vehicle (v)-the sway velocity range of 5 m/s to 15 m/s was determined by referring to the currently used ISVs, ensuring that the vehicle could adapt to the various operational speeds required for different scenarios.(ii) Sway velocity of the target (v t )-a fixed value of 10 m/s was chosen based on the average target speeds observed in similar pursuit and evasion scenarios.This value balances the difficulty for the ISV to capture the target while allowing for a meaningful analysis of the algorithm's effectiveness.(iii) Length of the vehicle (l)-a standard length of 4 m was selected to represent the typical dimensions of ISVs used in practical applications, ensuring the results are applicable to real-world scenarios.First of all, to compare the proposed ZSPEG-based planning scheme in both the single-ISV scenario and the multi-ISV scenario, the strategy set and the reward function for the single-ISV pursuit scenario are constructed as follows.In the single-ISV pursuit scenario, the heading vector of the ISV is limited to within their field of vision constraints, which are denoted by FoV P .Therefore, r p , the unit yaw vector for the pursuit vehicle, is limited to the range of − FoV P 2 , FoV P

2
. Furthermore, the maneuver planning for the ISV is closely related to its minimum steering angle (MSA).Then, under the constraints of the FoV and MSA, there could be ⌊FoV P /MSA P ⌋ options for ISV to move.Hence, the ith strategy for the pursuit vehicle is denoted by r p i = i − FoV P 2 , i ∈ {1, 2, • • • , ⌊FoV P /MSA P ⌋}.Similarly, the maneuver strategy set for the moving target could be assumed by r t j = i − FoV T 2 , j ∈ {1, 2, • • • , ⌊FoV T /MSA T ⌋}.The payoff function for the single-ISV target pursuit is designed to consider two factors: the distance between the pursuit ISV and the target, and the distance between the target and the point of attack.These two factors could be formulated by the following equations: where (D P ) i,j denotes the pursuit feedback of the distance between the vehicle and the target.(D T ) i,j denotes the attacking feedback of the distance between the target and the intended node.d 1 p r p i , r t j and d 1 t r t j denote the corresponding distance in the future situation after adopting the related motions.d 0 p and d 0 t represent the distances in the current situation.Then, the payoff for the pursuit ISV can be obtained by Equation (34).
where K A , K B , and K C are all constants.
To generate and analyze the trajectories under the single-ISV pursuit model, the initial position for the moving target is set to along the boundary of the protected area.The position of the unknown attacked node should be within the protected area.Thus, it is assumed that x t (0) = (80,5), x a = (20,10), x p (0) = (40,22), FoV T = 120 • , and MSA T = 1 • .After being solved by Algorithm 2, under different FoV P and MSA P configurations, nine trajectories in the single-ISV pursuit are shown in Figure 4.Although the rational motions were sequentially generated for the ISV in time, the vehicle fails to capture the target in all the situations.If the ISV does not fulfil the necessary conditions, the target could gain the initiative to attack its intended node.Therefore, it is an inevitable trend that multi-ISV pursuit replaces single-ISV pursuit.

Model Evaluation
In this subsection, we evaluate our proposed ZSPEG-based model by comparing it with two other target-pursuit methods through a series of simulations.The compared methods include the following: Graph theory-based: There are many versions of Voronoi algorithms being applied in a multi-robot pursuit scenario [40].Among these Voronoi-based methods, the GAM-Voronoi algorithm outperforms the others in a bounded convex environment, which utilizes a global area-minimization (GAM) strategy [41].This was chosen as the representative of the Voronoi-related algorithms.
Cooperative target hunting: To compare our method with strategy-based cooperative pursuit methods, a popular method, called Cooperative Hunting based on the Dynamic Hunting Points Allocation (CH-DHPA) [42][43][44], is selected.This method transforms the target pursuit to the dynamic hunting points allocation problem.
Our Proposed Method (TGRP-MNE): This method aims to achieve stable and accurate equilibrium points by implementing the ZSPEG-based model.
The reasons why the GAM-Voronoi algorithm and CH-DHPA were chosen for the comparison with our proposed methods include two main criteria.First, both the GAM-Voronoi and CP-DHPA methods address the core issue of maneuver planning in pursuit scenarios, similar to our study's focus.This relevance ensures that the comparisons are directly applicable and meaningful to the field of pursuit-evasion games involving multiple ISVs.Second, by comparing these methods, this study could have significant theoretical implications regarding the applicability of game theory versus graph theory and cooperative strategies in real-world scenarios.
The required parameters, the pros, and the cons of the three methods are shown in Table 3.To ensure the fairness of the experiment, while providing our method with additional parameters and information, we established ideal environmental conditions for the other two methods.The relevant environmental elements and geographic information

Model Evaluation
In this subsection, we evaluate our proposed ZSPEG-based model by comparing it with two other target-pursuit methods through a series of simulations.The compared methods include the following: Graph theory-based: There are many versions of Voronoi algorithms being applied in a multi-robot pursuit scenario [40].Among these Voronoi-based methods, the GAM-Voronoi algorithm outperforms the others in a bounded convex environment, which utilizes a global area-minimization (GAM) strategy [41].This was chosen as the representative of the Voronoi-related algorithms.
Cooperative target hunting: To compare our method with strategy-based cooperative pursuit methods, a popular method, called Cooperative Hunting based on the Dynamic Hunting Points Allocation (CH-DHPA) [42][43][44], is selected.This method transforms the target pursuit to the dynamic hunting points allocation problem.
Our Proposed Method (TGRP-MNE): This method aims to achieve stable and accurate equilibrium points by implementing the ZSPEG-based model.
The reasons why the GAM-Voronoi algorithm and CH-DHPA were chosen for the comparison with our proposed methods include two main criteria.First, both the GAM-Voronoi and CP-DHPA methods address the core issue of maneuver planning in pursuit scenarios, similar to our study's focus.This relevance ensures that the comparisons are directly applicable and meaningful to the field of pursuit-evasion games involving multiple ISVs.Second, by comparing these methods, this study could have significant theoretical implications regarding the applicability of game theory versus graph theory and cooperative strategies in real-world scenarios.
The required parameters, the pros, and the cons of the three methods are shown in Table 3.To ensure the fairness of the experiment, while providing our method with additional parameters and information, we established ideal environmental conditions for the other two methods.The relevant environmental elements and geographic information were set to ideal situations devoid of external disturbances.Furthermore, we also idealized the communication situation for these two methods.As they involve distributed communication, effective communication among multiple ISVs is crucial during task allocation and team collaboration.Hence, to enhance fairness during the comparison simulation, signal transmission was also configured to be in an ideal state for the methods of GAM-Voronoi and CP-DHPA.Based on the above settings, simulations were conducted under the fixed initial positions for the ISVs.The relevant initial positions were set as follows: x 1 (0) = (40,30), x 2 (0) = (40,10), x 3 (0) = (50,20), and x 4 (0) = (30,20).Furthermore, x t (0) and x n (0) were randomly chosen.The settings for the other parameters are shown in Table 2.For every group, the simulations were carried out 1000 times.The assessed metrics include the capture success rate-r s (%); the rate of nodes being attacked-r a (%); the number of collisions-C; and the approximate average energy consumption per successful capture-AEC (m•s).Requires more parameters and pursuit information The selected three parameters were computed as follows: (i) The capture success rate r s (%)-this parameter represents the percentage of simulations where the pursuit strategy successfully led to the target being captured within the defined constraints and conditions.It is computed by dividing the number of successful captures by the total number of simulation runs and then multiplying by 1000 to express this as a percentage.(ii) The rate of nodes being attacked r a (%)-similar to the success rate, the avoidance rate measures the percentage of simulations in which the target successfully evaded capture for the duration of the simulation or until it reached a safe zone.This is calculated by counting the instances of successful evasion, dividing this by the total simulations, and converting the result into a percentage.(iii) The average energy consumption per successful capture AEC (m•s)-to compute the AEC, the total energy consumed in each run is summed up and then divided by the number of runs to obtain the average.This provides a measure of how much energy, on average, is required for a vehicle to participate in a pursuit scenario over the course of the simulations.Mathematically, it can be represented as , where E i is the energy consumed in the ith simulation and n is the total number of simulations.This calculation helps to understand the energy efficiency of the different tactical approaches under various operational conditions, which is crucial for optimizing the pursuit strategies in terms of both effectiveness and sustainability.
The simulation results under the varying survival times, the varying pursuit velocities, and the different areas are shown in Tables 4-6, respectively.It should be noted that only the CP-DHPA method has the collision parameter C.This is because the CP-DHPA method requires additional collision avoidance measures, for which we need to analyze its effectiveness in avoiding collisions.However, in the GAM-Voronoi method, collision avoidance is inherently built into the method, preventing any collisions from occurring.In our TGRP-MNE method, a relay-pursuit strategy is adopted.Therefore, collisions between pursuit vehicles are not a concern, either.As presented by these three tables, our method outperforms the other two methods in achieving a relatively high success rate of capture r s and the lowest average energy consumption AEC.To be specific, when comparing Tables 4 and 5, we observed that the rate of nodes being attacked, r a , in the CP-DHPA method is lower than that in our proposed TGRP-MNE method.This phenomenon indicates that that our TGRP-MNE method might implement a more effective or aggressive pursuit strategy.Such a strategy could better block the routes of the target or more accurately predict the target's movements, but may not provide sufficient protection for the target's attack point.Furthermore, according to Table 6, our proposed method demonstrates a superior performance compared to GAM-Voronoi and CP-DHPA in terms of its success rate and energy consumption across all area scales.Based on these findings, if node protection is the primary goal in surface pursuit, CP-DHPA would be the preferred choice among the three methods.However, in general situations, achieving a high and stable success rate of capture is paramount for the surface pursuit.With adaptability to factors T S , v p , and the area scale, and the ability to achieve a high and stable success rate of capture and low energy consumption, our method can be considered the optimal choice for surface pursuit.

Sensitive Analysis
In this subsection, the sensitivity of our method under different conditions is tested and analyzed.The conducted experiments include the following: (i) analysis of n, T S , and v p , where we tested our methods under multiple scenarios with different initial conditions, including the number of vehicles n, the survival time T S , and the pursuit velocity v p ; (ii) system stability, where we assessed the stability of the pursuit system by observing the consistency of the pursuit strategies over extended simulation runs with different target strategy sets.These experiments could allow for a comprehensive evaluation of the proposed method, demonstrating its advantages and identifying potential areas for improvement.
Analysis of n, T S , and v p : To explore the impact of variables on the performance, simulations are conducted using different numbers of vehicles n, various survival times T S , and changing pursuit velocities v p .The results are shown in Figure 5.In Figure 5a, once T S exceeds 1.6 s, all the success rates would be stable, with a small fluctuation of around 85%.As shown in Figure 5b, there were no significant differences or changes among the data.The rates of attack were all below 15%.The above observations indicate that when the survival time exceeds a certain value, further increasing T S no longer contributes to improvements in the performance.According to the results shown in Figure 5c,d, with the pursuit velocity increases, the capture success rate in all the teams increases, while the rates of the node being attacked all decrease.To obtain a higher success rate and a lower rate of being attacked in a pursuit team, improving the pursuit velocity could be helpful.Furthermore, the team with n = 5 always achieves the best pursuit performance due to its having the highest number of vehicles.In sum, the variation in the team-scale and the pursuit velocity has a more significant impact on the model's performance compared to the changes in time.Therefore, during the optimization of the system, particular attention should be paid to the n and v p adjustments.
System stability: To determine how the strategy set of the target affects the pursuit, the strategy sets for the target are redesigned.For a pursuit team with n = 3, the strategy sets for the target are reformulated by the two newly designed sets: S tg and S tc .The first strategy set is based on the ZSG model, which was introduced in Section 2.2.S tc is more complex than the ZSG-based strategy set and the original one.The detailed strategies in S tc are shown as follows, and are also displayed in Figure 6: r t 1 : the target evades from the nearest ISV.r t 2 : the target adopts the collective evasion strategy, which is explained in Definition 4. r t 3 : the target heads directly toward its attacked node, denoted by the red star in Figure 6.r t 4 : the direction of the target is the angle bisector formed by r t 1 and r t 2 .
a lower rate of being attacked in a pursuit team, improving the pursuit velocity could be helpful.Furthermore, the team with  = 5 always achieves the best pursuit performance due to its having the highest number of vehicles.In sum , the variation in the team-scale and the pursuit velocity has a more significant impact on the model's performance compared to the changes in time.Therefore, during the optimization of the system, particular attention should be paid to the  and  adjustments.System stability: To determine how the strategy set of the target affects the pursuit, the strategy sets for the target are redesigned.For a pursuit team with  = 3, the strategy sets for the target are reformulated by the two newly designed sets:  and  .The first strategy set is based on the ZSG model, which was introduced in Section 2.2. is more complex than the ZSG-based strategy set and the original one.The detailed strategies in  are shown as follows, and are also displayed in Figure 6:    : the target evades from the nearest ISV.   : the target adopts the collective evasion strategy, which is explained in Definition 4.    : the target heads directly toward its attacked node, denoted by the red star in Figure 6.   : the direction of the target is the angle bisector formed by  and  .

Definition 4 (collective evasion strategy). From the perspective of the target, all angles formed between the two adjacent ISVs are taken into account in the collective evasion strategy. Additionally, the moving direction of the target is the parallelogram of the maximum angle formed by the two adjacent
ISVs.The purpose of this strategy is to enable the target to move immediately away from the entire pursuit team, rather than just from one ISV.The calculation for this evasion motion is as follows: Firstly,  ≔ ∡( ,  ) is set to represent the angle of the vector ( −  ). ≔  −  denotes the angle between two adjacent ISVs.If  = ,  is equal to  .Therefore,  could be obtained by Equation (35).
where  is the index when  is taken to its maximum value. is half of the  .
With changing survival times, simulations were carried out under  =  ,  =  and  =  , respectively.The results are shown in Figure 7.In terms of the success rate of capture and the rate of being attacked, it is apparent that there is no significant difference among the three strategy sets.This discovery underscores the robustness of our proposed method, showcasing its ability to maintain a consistent performance despite variations in the target's strategy set.

Definition 4 (collective evasion strategy).
From the perspective of the target, all angles formed between the two adjacent ISVs are taken into account in the collective evasion strategy.Additionally, the moving direction of the target is the parallelogram of the maximum angle formed by the two adjacent ISVs.The purpose of this strategy is to enable the target to move immediately away from the entire pursuit team, rather than just from one ISV.The calculation for this evasion motion is as follows: Firstly, α i := ∡(x t , x i ) is set to represent the angle of the vector (x t − x i ).β i := α i+1 − α i denotes the angle between two adjacent ISVs.If i = n, α i+1 is equal to α 1 .Therefore, r t 2 could be obtained by Equation (35).r t 2 = cos β i mh + β i m , sin β i mh + β i m T (35) where i m is the index when β i is taken to its maximum value.β i mh is half of the β im .
With changing survival times, simulations were carried out under S t = S tg , S t = S tc and S t = S to , respectively.The results are shown in Figure 7.In terms of the success rate of capture and the rate of being attacked, it is apparent that there is no significant difference among the three strategy sets.This discovery underscores the robustness of our proposed method, showcasing its ability to maintain a consistent performance despite variations in the target's strategy set.

Conclusions
This paper introduced a novel ZSPEG-based maneuvering method for multiple ISVs to pursue a moving target.The proposed method addresses the challenge of realizing the interaction between the pursuit team and the target and mitigating the data scarcity of the target.At first, the original  + 1 non-cooperative multi-ISV game model was reformulated into a two-player ZSPEG model.Based on the fictitious play solution, an MNE-based decision-making process was designed to provide the vehicles with rational and stable motions.Finally, the simulation results show that the proposed method achieves a comprehensively outstanding performance, obtaining high capture success rates, low energy costs, generalization, and adaptability.Nevertheless, the proposed model depends on centralized computing to achieve a consensus among multiple ISVs, which limits its adaptability to more intricate surface environments.To enhance the navigation of the surface vehicles under constrained surface communication conditions, future research will integrate decentralized computing into the proposed method.

Conclusions
This paper introduced a novel ZSPEG-based maneuvering method for multiple ISVs to pursue a moving target.The proposed method addresses the challenge of realizing the interaction between the pursuit team and the target and mitigating the data scarcity of the target.At first, the original n + 1 non-cooperative multi-ISV game model was reformulated into a two-player ZSPEG model.Based on the fictitious play solution, an MNE-based decision-making process was designed to provide the vehicles with rational and stable motions.Finally, the simulation results show that the proposed method achieves a comprehensively outstanding performance, obtaining high capture success rates, low energy costs, generalization, and adaptability.Nevertheless, the proposed model depends on centralized computing to achieve a consensus among multiple ISVs, which limits its adaptability to more intricate surface environments.To enhance the navigation of the surface vehicles under constrained surface communication conditions, future research will integrate decentralized computing into the proposed method.

Figure 1 .
Figure 1.Schematic of the surface pursuit.Figure 1.Schematic of the surface pursuit.

Figure 1 .
Figure 1.Schematic of the surface pursuit.Figure 1.Schematic of the surface pursuit.

Figure 2 .Assumption 1 .
Figure 2. Top-view of the ISV motion.Assumption 1.It is assumed that all the vehicles have a constant-speed motion, and the sway velocity for the individual vehicle is zero.

Algorithm 1 :
Payoff assignment for the pursuit team

Figure 3 .
Figure 3.The overall step-by-step process of the proposed algorithm.
(vii) Interval time (Δ)-an interval time of 0.01 s was selected to ensure precise and responsive updates to the ISV's path-planning algorithm, enabling real-time adjustments.(viii) Survival time of target ( )-the survival time of 2.0 s was established to assess the ISV's rapid response capabilities, providing a challenging yet achievable timeframe for the pursuit.(IX) Protected area ()-the protected area of 40 m × 80 m was set to represent a controlled environment where the ISVs and targets operate, allowing for consistent and repeatable testing conditions.

Figure 3 .
Figure 3.The overall step-by-step process of the proposed algorithm.
(iv) Intercepted distance to target (d c )-the value of 12 m was derived based on the length of the vehicle, indicating the optimal interception distance for successful captures while minimizing the risk of overshooting or collision.(v) Attack distance of target (d a )-an attack distance of 6 m was chosen based on the ISV's common response capabilities, ensuring effective engagement.(vi) Safe distance between the ISVs (d s )-a safe distance of 9 m was determined to prevent collisions between multiple ISVs operating within the same area, allowing for coordinated maneuvers.(vii) Interval time (∆t)-an interval time of 0.01 s was selected to ensure precise and responsive updates to the ISV's path-planning algorithm, enabling real-time adjustments.(viii) Survival time of target (T S )-the survival time of 2.0 s was established to assess the ISV's rapid response capabilities, providing a challenging yet achievable timeframe for the pursuit.(IX) Protected area (A)-the protected area of 40 m × 80 m was set to represent a controlled environment where the ISVs and targets operate, allowing for consistent and repeatable testing conditions.

Figure 4 .
Figure 4. Parameter assumption for the monomer pursuit.

Figure 4 .
Figure 4. Parameter assumption for the monomer pursuit.

Figure 5 .
Figure 5. Performance of the pursuit team with different N under (a,b) varying survival times and (c,d) varying pursuit velocities.Figure 5. Performance of the pursuit team with different N under (a,b) varying survival times and (c,d) varying pursuit velocities.

Figure 5 .
Figure 5. Performance of the pursuit team with different N under (a,b) varying survival times and (c,d) varying pursuit velocities.Figure 5. Performance of the pursuit team with different N under (a,b) varying survival times and (c,d) varying pursuit velocities.

Figure 6 .
Figure 6.The enriched motions for the target.

Figure 6 .
Figure 6.The enriched motions for the target.

Figure 7 .
Figure 7. Performance under the three strategy sets for the target.

Figure 7 .
Figure 7. Performance for the three strategy sets for the target under (a,b) varying survival times.

Table 2 .
Parameters chosen for the monomer pursuit.

Table 3 .
Preliminaries of the three methods.

Table 4 .
Comparison under a varying T S with v p = 15 m/s.

Table 5 .
Comparison under a varying v p with T S = 2.0 s.

Table 6 .
Comparison under different areas.