Cooperative Threat Engagement Using Drone Swarms

The ability of multiple manned and unmanned aircraft systems to cooperatively engage and disable an aerial threat plays a decisive role in modern warfare scenarios. In this paper, we apply key methods to enable the so-called cooperative threat engagement capability among multiple networked agents, e.g., a swarm of drones, with combat and communication capabilities. In particular, this research combines AI-based decision-making and control techniques for a swarm of loyal wingman drones to coordinate efficient defense actions in a cooperative and autonomous manner. We apply these concepts in a defense scenario that is modeled to analyze the loyal wingman concept, which we consider an interesting testbed for cooperative decision-making and low-level control techniques. The investigated methods were implemented in a realistic 3D UAV simulator for demonstration and evaluation.


I. INTRODUCTION
The cooperative engagement capability (CEC) is an emerging systems-of-systems capability in which multiple systems coordinate their actions through a network to improve their abilities to effectively perform a given task [1]. The term was coined in the 90s in the Johns Hopkins University Applied Physics Laboratory in an effort to enhance the United States Navy's defense capability. The project aimed to improve the situational awareness of a fleet by sharing tactical information among data link participants and speeding up the decision process regarding the interception, engaging, and neutralization of a flying threat, e.g., a cruising missile or aircraft. With the recent advances in aerial robotics, the CEC concept can be extended to unmanned flying platforms The associate editor coordinating the review of this manuscript and approving it for publication was Zhong Wu .
to promote group behavior towards the achievement of a common purpose via a shared communication network. In this context, drone swarms are of particular interest since they enable the use of several low-cost and even disposable platforms in modern defense or surveillance applications.
Recently, the CEC concept has been studied in modern defensive scenarios using autonomous robot swarms [2], [3], [4], [5], [6]. In particular, [2] studied the twodimensional decision-making problem using adversarial UAV swarms. The mission of each swarm is to defend its base and destroy the adversarial one. To this end, each UAV is equipped with a weapon. Reference [3] studied the intelligent decision-making for an UAV swarm whose mission is to identify and attack specific targets. Similarly, [5] studied a reconnaissance and attack scenario using heterogeneous UAV swarms. These swarms are divided by subsets containing reconnaissance UAVs, attack UAVs, and VOLUME 11, 2023 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ reconnaissance-attack UAVs. On the other hand, [4] addresses the defense problem of a critical area using a swarm of UAVs against a larger one. Reference [6] focused on artificial intelligence models for offensive air-to-air combat scenarios using UAVs equipped with machine guns. In particular, manned-unmanned teaming (MUM-T) is an essential component of the future unmanned aircraft systems' operational environment. It extends the CEC concept by promoting the seamless integration of unmanned platforms, endowed with some level of autonomy, with manned ones. The unmanned platforms aim to perform given tasks under the command of the manned ones. This integration combines the inherent strengths of both platforms to achieve synergy not seen in single-agent systems, thus enabling a higher degree of effectiveness for the team of networked agents. Specifically, the MUM-T context includes the concept of the loyal wingman (LW), which is an unmanned aerial vehicle (UAV) under the tactical command of a high-level manned or remotely controlled leader [7]. Recently, we have witnessed a rapid emergence of multirotor aerial vehicles (MAVs) [8] in either research and industrial applications where aircraft size, weight, and cost play a critical role in their operation, effectiveness, or feasibility [9]. Therefore, we argue that low-cost experimental MAVs are a suitable alternative to investigate the loyal wingman concept in MUM-T scenarios.
This paper investigates the use of LW MAVs with combat capabilities to cooperatively engage and disable aerial explosive threats. Specially, we highlight that this work is an extended version of our previous work [10], which evaluated a 3D defense scenario where a MUM-T composed of loyal wingmen protects a leader UAV and a critical infrastructure from aerial threats. Here, we present a more detailed investigation of the same scenario. In this paper, the vehicles are modeled as fully actuated MAVs, which can perform independent attitude and position tracking [11], [12]. To tackle the design problems that arise from such context, we proposed a problem breakdown that focuses on both the high-level (HL) decision-making tasks and the low-level (LL) control aspects to enable the CEC among the leader and the loyal wingmen in a defense scenario. The LL control addresses the well-recognized robustness requirement for MAV flight control systems against disturbances and uncertainties by designing a joint geometric attitude-position control law using a multi-input first-order sliding mode strategy. Moreover, to provide the MAVs with collision avoidance capability, the reciprocal acceleration velocity obstacles (AVO) [13] method is used. On the other hand, the autonomous behavior of the MAVs is designed using state-of-art artificial intelligence (AI) techniques, by applying finite-state machines (FSMs) and behavior trees (BTs) decision-making algorithms [14] to select and execute modular behaviors for the LW and the aerial threats. We also focus on how the agents must act cohesively as a team, so that cooperation between the MUM-T can emerge, for this we use the classic Kuhn-Munkres task allocation algorithm [15] to distribute tasks between the MUM-T.
To the best of our knowledge, the main contributions of this paper are 1) an extension of the defense scenario proposed in [10] considering a three-dimensional environment with realistic flight dynamics, 2) the application of the reciprocal AVO collision-avoidance method and a multi-input sliding mode flight control law, and 3) the modeling of AI-based decision-making with FSMs and BTs for the proposed scenario.
The remaining of this paper is organized as follows. Subsection I-A presents the notation used along the paper. Section II presents the scenario of interest. Section III defines the LL and HL problem breakdown. Section IV presents the methodology. Section V shows the interface between the HL commands and the LL. Section VI shows the computational architecture used to simulate the scenario. Section VII evaluates the proposed solution using computer simulations. Finally, Section VIII concludes the paper and shares ideas for future work.

A. NOTATION
Matrices and algebraic vectors are denoted, respectively, by uppercase and lowercase boldface letters, while geometric (Euclidian) vectors are denoted as ⃗ a. Sets are denoted by uppercase calligraphic letters as D. A Cartesian coordinate system (CCS) is represented as with B denoting its origin, and ⃗ x b , ⃗ y b , and ⃗ z b representing the unit geometric vectors along its orthogonal axes. The algebraic vectors corresponding to the projection of an arbitrary physical vector ⃗ a onto S b and S g are denoted by a b ∈ R 3 and a g ∈ R 3 , respectively. The relation between a g and a b is (3) is the attitude matrix of S b relative to S g and SO(3) ≜ {D ∈ R 3×3 , | D T D = I 3 } represents the special orthogonal group. The inverse of D b/g is denoted by D g/b . The identity matrix of order n ∈ Z >0 is denoted by I n . Consider two arbitrary algebraic vectors x = (x 1 , . . . , x n ) and y = (y 1 , . . . , y n ). The vector inequality x < y means that x i < y i , ∀i ∈ {1, . . . , n}. The sign of x is defined as sign(x) ≜ (sign(x 1 ), . . . , sign(x n )), where Moreover, the Euclidean norm and component-wise absolute value of x are denoted, respectively, by ∥x∥ and |x| ≜ (|x 1 |, . . . , |x n |) ∈ R n . Similarly, the component-wise absolute value of a generic matrix A ∈ R m×n is abs (A) ≜ |A i,j | ∈ R m×n , where A i,j is the element of the ith row and jth column of A. The standard basis vectors of R 3 are denoted by e 1 ≜ (1, 0, 0), e 2 ≜ (0, 1, 0), and e 3 ≜ (0, 0, 1). Let ⃗ a i/g represent an arbitrary physical quantity of S i with respect to S g ; e.g., along the paper, ⃗ v i/g will denote the velocity of S i relative to S g . An open sphere of radius ρ ∈ R >0 centered at p ∈ R 3 , the Minkowski sum of two sets, and the set subtraction are Finally, consider the S g representations a g ≜ (a 1 , a 2 , a 3 ) and b g of ⃗ a and ⃗ b, respectively. The vector product ⃗ c = ⃗ a × ⃗ b is represented in S g by c g = a g × b g , where a g × is the following skew-symmetric matrix:

II. SCENARIO OF INTEREST
To address the concept of CEC among the MUM-T, we define a defense scenario where there are two teams composed of fully actuated MAVs: The MUM-T composed of LW MAVs that fly in formation alongside, i.e., escorting the manned leader, and must defend him as well as a protected area from attacks of an adversarial team, composed by a swarm of kamikaze threats, as depicted in Fig. 1. Therefore, the MUM-T mission is considered to fail if the leader is hit by a single kamikaze, or the protected area is hit by a fixed number of kamikazes. The protected area is equipped with ground assets that provide aerial surveillance capability to the MUM-T members.
The LW is autonomous and capable of making intelligent decisions based on the situational awareness information provided by the ground assets considering that they do not have onboard sensors to obtain information about the threats, such as cameras or LIDAR. To be able to neutralize the threats, the LW is equipped with two hypothetical types of weapons, a mid-range freezing gun and a short-range vaporizer gun, both with a limited number of cartridges and a fixed cool-down time interval, i.e., they become unavailable for a fixed-time period after shooting. The vaporizer gun can neutralize the threat, while the freezing gun slows down the threat by half of its maximum speed. The weapons' model is simplified, being the hitting success calculated by a given probability. Note that, the freezing gun is intended to make the decision space more complex, in practice, this type of weapon is infeasible given current technology, and the vaporizer gun is also an idealized weapon that uses energy to destroy only the electronic components of the kamikaze.
The leader is remotely controlled by a human and it is in charge of the formation coordination, i.e., it is capable of passing relative coordinate commands to the LW. As a matter of simplification, we assume that the leader can only command the loyal wingmen to fly within one predefined formation pattern. In this paper, we consider this unique pattern as an uniform-circular formation along the local horizontal plane with a desired radius. We also assume that the leader is always capable to command the loyal wingmen whenever required, i.e, it can override the decision-making commands of a specific LW in the MUM-T, either by enforcing the loyal wingmen to return to the formation, or selecting an LW to engage or shoot a specific threat.
On the other hand, the kamikaze MAVs are assumed to explode as soon as they collide with either the leader, an LW, or the protected area. When the explosion is triggered, its effect destroys the target vehicle or damages the protected area. Additionally, once a kamikaze selects a target, it will chase this target until the kamikaze explodes itself or is neutralized by a vaporizer gun. The number of kamikazes is fixed in the simulated scenario, i.e., they immediately respawn after being neutralized, thus keeping a steady stream of attacks. The kamikazes have a simplified AI but are assumed to be faster and more numerous than the MUM-T, thus forcing the MUM-T to cooperatively work in order to effectively neutralize the kamikazes.
The ground assets have a limited detection range and share situational awareness with the MUM-T, i.e., a vector with the state (position, attitude, and linear velocity) of all entities. To simplify the scope of this work, we assume the ground assets to be perfect radars, and that the communication among the MUM-T is perfect, with no delays, packet loss, or bandwidth limitation.

III. PROBLEM BREAKDOWN
In this section, we introduce a problem breakdown according to the scenario of interest. A common approach in robotic cognition involves decomposing the agent into software layers, such as situational awareness, decision-making, and control, as shown in Fig. 2 for a scenario containing a leader and n ∈ Z >0 autonomous MAVs. In this paper, we use a similar breakdown and focus on both the HL decision-making tasks and the LL control aspects, abstracting the situational awareness block in Fig. 2.
The HL decision-making layer receives state information from the situational awareness containing the pose and linear velocity of each object of interest in a global coordinate system. Once this information is received, the HL layer process the states using AI-based decision-making algorithms that output action commands. There are two types of action commands: rotational and translational movement commands and commands regarding the use of weapons. The LL control, on one hand, translates the movement actions VOLUME 11, 2023 into appropriate state commands that are used to calculate the control input to the MAV, and, on the other hand, allocates the weapon actions using the available resources. Moreover, the MAV module contains the rotational and translational dynamics of the vehicle, the control allocator that translates the control input into actuator commands, the actuator model, and the weapons model. Note that, the weapons' model is simplified, i.e., the aiming and projectile dynamics are instantaneous, being the hitting success calculated by a configurable probability. Lastly, we also assume that the MAVs share a synchronized internal model of the world in the sense that they are aware of the same threats and other agents state due to the ground-assets radars.
Each agent is identified by a unique ID. In this sense, we define a set I ≜ I m ∪ I k containing the identifier IDs of all agents, where I m is a set containing the identifier IDs of the MUM-T members and I k is a set containing the identifier IDs of the kamikazes.

A. LOW-LEVEL CONTROL PROBLEM
This section presents the translational and rotational dynamics of a general fully actuated MAV equipped with fixed rotors and defines the LL control problem. For a more detailed dynamic modeling, please see [12], [16], [17], and [18].
Consider a ground reference CCS S g ≜ G; ⃗ x g , ⃗ y g , ⃗ z g located at a known point G on the ground, with ⃗ z g oriented upwards, parallel to the local vertical. In addition, consider a body-fixed CCS S i ≜ {B i ; ⃗ x i , ⃗ y i , ⃗ z i } tied to the airframe of the MAV i, ∀i ∈ I, with B i coinciding with its center of mass, as depicted in Fig. 3.
The translational kinematics and dynamics of MAV i are described, respectively, bẏ where r i/g g ∈ R 3 , v i/g g ∈ R 3 , and m i ∈ R >0 are, respectively, the MAV i position, velocity, and mass, g ∈ R >0 is the gravity acceleration magnitude, and f c,i g ∈ R 3 and f d,i g ∈ R 3 are, respectively, the control and disturbance forces acting on the MAV i.
On the other hand, the rotational kinematics and dynamics of MAV i are represented, respectively, by [19] where In the proposed scenario, the MAV i moves among static and dynamic obstacles, and is subject to velocity constraint, i.e., where P i (t) ⊆ R 3 is the collision-free space, which varies as the obstacles move, and V i ⊆ R 3 is an admissible convex set. We now define the main problem of the LL control. Problem 1: Execute the HL decisions in the presence of the constraints (5)- (6) and the disturbances f d,i g and τ d,i i . To tackle Problem 1, we adopt the hierarchical control architecture shown in Fig. 4, which contains an outer-loop guidance module and an inner-loop flight control module. Additionally, the high-level to low-level (HL-LL) interface translates the HL decisions into an appropriate target positioň r i/g g ∈ R 3 and target attitudeĎ i/g ∈ SO(3), which are fed to the guidance law. The HL-LL interface is detailed in Section V. This architecture allows to separate the overall control requirements between the modules. In this sense, the role of the guidance module is to conduct the vehicle to the target pose, while avoiding collisions and ensuring the satisfaction of linear velocity constraints. The control module is entrusted with the task of providing stability and robustness for the motion. We assume that the obstacles' (i.e. other MAVs and protected area) position and velocity are available for feedback and are represented by the set O i . A similar architecture has been recently investigated in [20] for underactuated MAVs, using robust model predictive control (RMPC) in the guidance loop.
Let us denote the command of a given variable using an overbar symbol, e.g.,r i/g g andD i/g denote, respectively, the position and attitude commands.
Once the HL decision is translated to a target position and attitude, the guidance law suitably generates the commands r i/g g andD i/g to the flight control.
Under the adopted hierarchical control architecture, Problem 1 can be divided into the following two subproblems: Subproblem 1.1: Design a flight control law to make the MAV i described by (1)-(4) robustly trackr i/g g andD i/g in the presence of the disturbances f d,i g and τ d,i i . Subproblem 1.2: Design a guidance law to generater i/g g andD i/g to the flight control law to make the MAV i reachř i/g g andĎ i/g (translated from the HL decision) while respecting constraints (5)- (6).
To tackle Subproblem 1.1, we design a joint attitudeposition control law using a first-order sliding mode control (SMC) approach [21]. It uses a high-frequency switching control to drive and keep a given system output to a sliding manifold, where the system ideally becomes insensitive to bounded disturbances of the matched type. Specifically, we design a global first-order sliding mode control (GSMC) that guarantees this insensitivity property during all the time by eliminating the reaching phase [22], [23], [24], under the condition that the control commands are sufficiently smooth. On the other hand, to tackle Subproblem 1.2 and guarantees the smoothness requirement of the control commands, we propose a guidance strategy based on the reciprocal acceleration velocity obstacles (AVO) method [13].

B. HIGH-LEVEL DECISION-MAKING PROBLEM
In this section, the problems to be solved by the decision-making of the LW agents are described.
The main mission of the LW agents is to defend both the protected area and the remotely-controlled leading drone against multiple incursions from a swarm of kamikaze drones. Therefore, an LW must effectively engage and disable threats to ensure the safety of both. Furthermore, for this mission to be successful, LW MAVs are expected to have the ability to make intelligent decisions autonomously. Thus, we can define necessary capabilities for this agent, such as the ability to fly in formation with cohesion, and the ability to engage and neutralize imminent threats to guarantee the defense of both the protected area and the leader. To effectively achieve neutralization, agents must have the capability to use embedded weaponry, i.e, use weapons intelligently and strategically, and in extreme cases, sacrifice themselves to neutralize a threat.
The main problem of the HL decision-making can now be defined.
Problem 2: The main problem is to develop an autonomous intelligence module for the LW MAV to successfully achieve the mission objective, aiming at the smallest loss of LW agents during the attack and defense maneuvers.
Based on this problem it is identified a need of defining a set of expected behaviors for the agents. Therefore, we define the following subproblem: Subproblem 2.1 Design basic behaviors for the LW agents and to develop a decision-making architecture to coordinate these behaviors.
To tackle Subproblem 2.1 it is also necessary to design the adversary agents present in the scenario, so we can properly evaluate our solution to Problem 2.
Subproblem 2.2: Design basic behaviors for the kamikaze agents and to develop a decision-making architecture composed of these behaviors.
Therefore, the main agent in our problem is the LW MAV, who will have to perform actions based on the AI-based decision making, as seen in Fig. 5. The action a ∈ A, where A is a set of possible actions, is performed by the agent in a simulated environment which in turn will send the agent's state s ∈ S as feedback. Note that the environment is the scenario of interest and it encompasses all loyal wingmen agents, the leader, and its kamikaze adversaries.
One of the subproblems to be also addressed is the allocation (or assignment) of tasks among the LW MAVs. By task we mean the neutralization of a specific kamikaze threat.
Despite the LW being autonomous, the agents must act cohesively so that cooperation between the MUM-T can arise. One way to approach this is that the MAV leader must be able to command the LWs when needed, so he must be able to distribute tasks to the MUM-T members. This is known as the classical assignment problem which is a widely-studied problem applicable to many domains [25]. It is the problem of choosing an optimal assignment of n ∈ Z + workers to n tasks, assuming that numerical ratings, or costs, are given for each worker's performance on each task. An optimal assignment is the one that makes the sum of the workers's costs for their assigned tasks a maximum (or minimum) [26]. Note that there are n! possible assignments, of which several may be optimal. In this sense, consider the following subproblem. Subproblem 2.3: Given a set of workers W, a set of tasks J , and a set of costs C indicating how effectively each worker w i ∈ W, where i ∈ {1, . . . , n}, can perform each task j i ∈ J determine the best possible assignment of workers to tasks, such that each task is assigned to one worker and each worker is assigned one task, so the total cost is minimized.
One wishes to choose a set of n independent elements (c i,j ) of a cost matrix C, where c i,j is the element of the ith row and jth column of C, so that the sum of these elements is minimized. This can be expressed as permuting the rows and columns of C to minimize the trace of a matrix min L,R Tr(LCR), where L and R are permutation matrices, and the cost matrix is A well-known solution to the assignment problem is given by the Kuhn-Munkres algorithm, colloquially known as Hungarian algorithm, proposed by Kuhn [15] and refined by Munkres [26]. This algorithm solves the assignment problem assuming the existence of a cost matrix C ∈ R n×n , where the element in the ith row and jth column represents the cost c i,j of assigning the jth task to the ith worker.

IV. METHODOLOGY
This section presents the methods used for addressing Problems 1-2. Subsection IV-A describes the method used to solve Problem 1, while Subsection IV-B describes the methods used to solve Problem 2.

A. LOW-LEVEL CONTROL METHODOLOGY
This subsection designs a joint attitude-position control law designed based on a GSMC policy and a guidance strategy based on the reciprocal AVO method for addressing, respectively, Subproblems 1.1 and 1.2.

1) JOINT ATTITUDE-POSITION CONTROLLER
Consider the objective of tracking a time-varying position commandr i/g g and attitude commandD i/g that satisfy the following assumption.
Assumption 1: The time varying commands are such that, at the initial time instant,r i/g , and are at least one time differentiable with respect to time, such that their first-time derivatives are Lipschitz continuous.
Assumption 1 is not restrictive; it just requires the knowledge of the MAV i initial position, attitude, and velocities, and the use of suitable differentiable commands.
To design the controller, we derive the equations describing the control errors kinematics and dynamics. To this end, define a command-related CCS S¯i ≜ B i ; ⃗ x¯i, ⃗ y¯i, ⃗ z¯i representing the commanded position and orientation for the MAV i body-fixed frame S i . The attitude and angular velocity control errors can be defined, respectively, as [12], [19], and [27] are, respectively, the attitude and angular velocity commands of the MAV i.
Similarly, let us define the position and linear velocity control errors, respectively, as wherer i/g g ∈ R 3 andv i/g g ∈ R 3 are, respectively, the position and velocity commands of the MAV i.
The time derivatives of (8)-(11) can be put, using (1)-(4), into the following state-space model: 9534 VOLUME 11, 2023 From (13), it can be immediately seen that the term d i is of the matched type, i.e., it belongs to the span of the input matrix B i . Consequently, a first-order SMC law is able to guarantee the robustness (or invariance) of the closed-loop system with respect to d i [28]. Now, let we define the sliding variable where C i ∈ R 3×3 . The corresponding sliding set is From Assumption 1, note that the system is in the sliding set S i at the initial time instant. Therefore, by designing the control u i such that the inequality,V ≤ −βV 1/2 [29], with β ∈ R >0 , is satisfied from the initial time instant, it holds that (x i 1 ,ẋ i 1 ) ∈ S i during all the time, and consequently (x i 1 (t),ẋ i 1 (t)) = (0 6 , 0 6 ), ∀t ≥ 0. Therefore, from (14), it holds that x i 2 (t) = 0 6 , ∀t ≥ 0. In other words, the system is capable of exactly tracking the position, attitude, linear velocity, and angular velocity commands, i.e., r By differentiating (15) with respect to time and substituting (12)-(13) into the resulting expression, we can obtain the dynamic equation for s i (we omit the function independent variables for conciseness): Regarding the disturbance d i consider the assumption: Assumption 2: The disturbance d i is bounded according to is a known vector with positive components.
By choosing the control law guarantees the existence of a global sliding mode of s i in S i , thus completing the objective of Problem 1. The proof is omitted here but is analogous to the one presented in Section 3.1 of Reference [30]. For implementing the control input u i , it is necessary a control allocation module, which, for simplicity, has been omitted in Fig. 4. The control allocation generates individual throttle commands to the rotors to make the effective resultant control force and torque close to the respective command. For details about the control allocation of fully actuated MAVs with fixed rotors, one can see reference [12]. In particular, in the simulation study of the present paper, as we have considered a non-redundant fully actuated vehicle containing six rotors, the respective control allocation matrix is a square one, and, consequently, the allocation is immediately solved by, first, inverting that matrix and, second, converting the obtained rotor thrusts into throttle commands. For this end, as usual, we assume that the ith rotor thrust is described by where k f is a force coefficient, ϖ i is the rotation speed, and i ∈ {1, 2, . . . , n r }, being n r ∈ Z >0 the number of rotors.

2) GUIDANCE STRATEGY
This section formulates a guidance strategy based on the reciprocal AVO method to solve Subproblem 1.2.
Let us define a CCS S p ≜ P; ⃗ x p , ⃗ y p , ⃗ z p fixed to a protected area reference point P, with ⃗ z p oriented upwards, parallel to the local vertical. In the proposed scenario, an LW considers the other MAVs and the protected area as obstacles, while a kamikaze only considers the other kamikazes as obstacles since their main goal is to collide against the MUM-T and the protected area. In this sense, we can define a set I o i containing the ID of all obstacles considered by the MAV i [31]. Now, since the leader is remotely controlled, the guidance strategy presented here for the MAV i is only implemented on the loyal wingmen and the kamikazes, i.e., i ∈ I\ℓ, where ℓ is the leader ID. Moreover, it is important to highlight that, the smoothness requirement of the leader commandsr ℓ/g g andD ℓ/g can be achieved by properly generating the leader commands.
In order to guarantee the smoothness requirement of the commandsr i/g g andD i/g of Assumption 1, we propose a guidance strategy that generates these commands using a position and an attitude reference filters, as depicted in Fig. 6. Then, we use the AVO method to choose a target velocity v i, * g ∈ R 3 aiming to avoid collision with the obstacles and respect the velocity constraint (6).
The attitude reference filter is designed using a harmonicjerk s-curve trajectory generation algorithm [32]. On the other hand, to fulfill the smoothness requirement of the position command, the position reference filter generates the position command for the MAV i by integrating twice the following acceleration command: where δ i ∈ R >0 . VOLUME 11, 2023 From (17), it can be seen that by limiting v i, * g ,v i/g g is bounded. Therefore, by choosingr Consider that the MAVs and the protected area can be circumscribed by spheres. In this context, a collision might occur between the MAV i, where i ∈ I\ℓ, and Obstacle j, where ρ ij ≜ ρ i + ρ j , being ρ i ∈ R >0 the radius of the sphere enclosing MAV i and ρ j ∈ R >0 the radius of the sphere enclosing Obstacle j.
Due to the existence of the global sliding mode in the flight control loop, and considering v i, * g as a constant input, an expression for the MAV i position can be obtained by integrating (17) two times with respect to time, being given by where t 0 ∈ R >0 is an initial time instant, t ≜ t − t 0 , and Similarly to the original VO approach [33], we consider that the Obstacle j keep its velocity constant, i.e., its position can be written as r j/g g (t) = r j/g g (t 0 ) + tv j/g g (t 0 ). Then, substituting (19) and r j/g g (t) into the collision condition (18) and dividing both sides of the resulting expression by t + where .
The AVO defines a set of target velocities for the MAV i that will result in a collision with the Obstacle j within a time horizon τ ∈ R >0 , if the Obstacle j keeps a constant velocity. It can be defined from (20) as The procedure is analogous to the original VO method: continually selecting a velocity outside AVO τ i,j guarantees that the MAV i and Obstacle j will not collide. By using this procedure considering all the obstacles, we guarantee collision-free guidance for the MAV i and consequently satisfy the position constraint (5).
The above procedure is suitably for passive obstacles, i.e., obstacles that do not change their motion based on their surroundings. However, if the moving obstacles are entities capable of making decisions, then this reactive nature has to be considered. This problem is referred in the literature as reciprocal collision avoidance [34] and can be incorporate into the presented methodology by modifying the AVO set relative to the non-passive obstacles (see Reference [13]).
In the proposed scenario, an LW considers the other LW as non-passive obstacles and the kamikazes as passive ones. On the other hand, a kamikaze considers the others kamikazes as non-passive obstacles.
To account for the velocity constraint (6), it has to be explicitly rewritten in terms of the target velocity v i, * g . To this end, note that, from (17), by choosing v i, * g ∈ V i , it holds that v i/g g ∈ V i since V i is a convex set. Then, due to the existing of the global sliding mode, one can show that by choosing , guarantees the satisfaction of the position and velocity constraints (5)-(6).

B. HIGH-LEVEL DECISION-MAKING METHODOLOGY
In this section, we discuss the methodology used to address Problem 2. For that, we focus on the artificial intelligence of both types of agents present in the scenario of interest. The development of the decision-making architectures of the agents is based on our previous work [35], where a prototype of the scenario was developed, in order to have an initial decision architecture of the LW and kamikaze agents.
Many approaches to robotic decision-making have been developed; in modern applications such as autonomous cars, the agents are highly deliberative [36]. In these complex scenarios, decision-making is further subdivided into modules tailored to solve subtasks, which are referred to as behaviors [37]. Once this decomposition is performed, we need to orchestrate the execution of these behaviors. Many behavior selection mechanisms exist, with finite-state machines (FSMs) [38] and behavior trees (BTs) [39] being the most popular techniques. The design of these mechanisms is empirical and is mainly based on intuition, creativity, experience, and good practices [38]. Furthermore, this process is iterative, with the agent performance being evaluated by an expert or through statistics, in order to select the best agent architecture [39]. On the other hand, optimization methods and machine learning techniques may be employed to tune decision parameters [35] or to learn complete behaviors with no prior knowledge [40].
To start addressing Problem 2, we need the kamikaze agents to be functional via simplified but effective AI. Consequently, we selected the FSM technique to develop the decision-making module for the kamikaze. The FSMs are the most common model of computation for modeling decision-making architectures for simple AI [39], [41], due to its easy implementation and intuitive structure. In this technique, a state s represents a behavior for the agent. A FSM can switch between behaviors in response to events. In Fig. 7, we present the decision-making developed for the kamikaze MAV, note that each MAV contains an identical decision module. In our solution, the kamikaze has a set K of four behaviors, the initial behavior is IdleState, where the agent is idle for t 1 seconds and then selects a target to attack based on the probabilities p 1 , p 2 , and p 3 , which in this work are equally distributed probabilities used for each target type. Once a target is selected, the FSM will transition to the representative state of the selected target and will remain in that state until the agent is destroyed. The exception is Attack Loyal Wingman; this behavior targets the closest LW, and will select the closest LW every t 2 = 1 s, to avoid frequent target switching. Once there are no more LW, the FSM transitions to attack the leader or the protected area with a probability of 50% each.
In contrast, the behaviors to be executed by the LW MAV are more complex given their desired capabilities, consequently, its AI architecture requires further elaboration. To effectively solve Problem 2, we implement its decision-making module using the BT technique, due to its inherent advantages in comparison to FSMs, such as behaviors that are highly modular, reactive, and flexible to changes [14], [39]. We briefly present the basic BT framework used, but keep in mind that alternatives commonly employ extensions [14], [42]. A BT is composed by nodes of two types: composite or leaf. Composite nodes control the BT logic, while leaf nodes execute the behavior modules or condition checks. When executed, each node returns one of the following execution statuses: Success, Failure, or Running. Table 1 presents the return status logic of each node type.
Therefore, based on Problem 2 description, we can identify the modular behaviors for the decision-making of a LW, in Table 2. We present three movement behaviors: Chase Threat, Go To formation, and Approach Formation; and three offensive behaviors: Vaporize Threat, Freeze Threat and Sacrifice Attack. Fig. 8 presents the behavior tree for a LW MAV using the behaviors described in Table 2. Note that every LW has an identical and independent copy of this decision-making module. This decision-making module will run by default the Go To Formation behavior, where the LW keeps a flying formation surrounding the leading drone while there are no threats identified. Whenever a kamikaze drone crosses the engagement range or the LW is allocated to a task, it enters into the Chase Threat behavior, for which the vaporizer weapon must be available. In this case, the agent leaves formation to pursue the threat and tries to neutralize it using available weapons. Once a kamikaze is in range of use of the weapons, the LW selects a neutralization method. The selected method may be the vaporizer gun, the freezing gun or a sacrifice as last resort when there is no remaining ammo. A weapon is available if there is still ammunition and if it is not cooling down after being fired. Note that the engagement distance and the range of use of both weapons are decisionmaking parameters, with the latter being less or equal than the nominal range of the weapons, thus leaving the use of weapons as a strategy criterion.
The neutralization strategy embedded in this BT is to first select the closest threat within range of his freezing gun (midrange) and then try to freeze it. Once the threat is slower, it is safer to approach it, then the LW uses its vaporizer gun (short-range) to eliminate the threat. Note that the respective gun must be available at the time that this weapon needs to be used. The SacrificeAttack behavior is executed when there are no remaining ammo in both weapons, it causes the LW to collide with an imminent threat, thus using its body-frame as a protective weapon. Immediately after the threat is eliminated or becomes out of range, the LW returns to formation, thus ensuring the protection of the leader. In order to join the formation, the LW must following the sequence present in Fig. 9. Initially, the LW must first approach the formation from a safe distance d f ∈ R >0 and send a request to the leader asking for permission to rejoin. The leader receives VOLUME 11, 2023  and processes the request, allowing or not the entry into formation. The LW waits for the permission to be granted and for the coordinates in the formation. The safe distance d f is a parameter.
Note that the decision-making modules described in this section make decisions based only on its internal state and the states of the other agents present in the scenario, which are received through messages from the ground assets. Also, it is important to reiterate that the leader is capable of overriding the decisions made by the module of a specific LW, by either enforcing the loyal wingmen to return to formation, or selecting a specific LW to engage a selected kamikaze.

C. FORMATION DEFINITION
In this work, we consider that the leader defines the formation pattern as an uniform-circular formation along the local horizontal plane with a desired radius r f > 0 ∈ R >0 , as shown in Fig. 10. Given that there are a total of n f ∈ Z >0 LW in formation, the positions occupied by the loyal wingmen are defined by the set where D 3 (·) ∈ SO(3) is the rotation matrix around ⃗ z g , α = 2π/n f , and φ ∈ [−π, π) is an offset angle that can be used to rotate the formation around the leader. The assignment of the LW to the available positions in formation is sequential, based on the order of the acknowledged request to join the formation by the leader, and not necessarily match the vehicle ID.

D. LW-KAMIKAZE TASK ALLOCATION
This subsection addresses the solution of Subproblem 2.3, which employs the Hungarian algorithm for the task assignment between the LW. As seen in Fig. 11, the workers in our context are interpreted as the LWs available and the tasks are the neutralization of the kamikaze threats. The distance of the loyal wingman to a threat can be described as the euclidean distance d(p, q) ≜ p i − q j , where p i ∈ R 3 and q j ∈ R 3 are the current position of the ith LW and the jth kamikaze threat, respectively. So the cost of the ith LW to neutralize the jth kamikaze is where A v ∈ {1, 0} is a variable that is 1 if the LW weapons are available, and 0 otherwise. The constant P represents an additional penalty cost to assign a LW that does not have a available weapon, as it is important to avoid the allocation of unavailable LWs. Lastly, the engagement range is represented by the radius d a > 0, and the threats outside this range will not be considered during the assignment. The execution of the Hungarian algorithm outputs the lowest cost path through the C matrix, representing the optimal LW-Kamikaze assignment. In the example provided in Fig. 11 we have the optimal LW-Kamikaze task allocation indicated by the blue arrows, note that the threat k 4 is not considered, since it is outside d a .

E. COOPERATION AND SELFISHNESS BEHAVIORS
The CEC is an emerging systems-of-systems capability in which multiple systems coordinate their actions through a network to improve their abilities to effectively perform a given task [1]. Highly developed systems of cooperation and mutual support can be found in different sorts of systems, a general consensus is that cooperation is in general advantageous for the group of cooperators as a whole, even though it may curb some individual's resources [43].
Altruism means that a system performs actions that increase the fitness of another system. The term Weak altruism can be defined as behavior that benefits more to another individual than to the individual performing the behavior. A Strong altruism denotes behavior that benefits others, but at one's own cost [44]. Both are common and necessary in those highly cooperative systems, which often have a collective organization with full division of labor, including individuals who are only defended by others, i.e, the MAV leader, or who are prepared to sacrifice themselves for the defense of others, i.e, the LW who sacrifices itself in extreme cases to ensure the protection of the leader. These kind of systems are found in certain species of insects, and in human society [44].
On the other hand, the term Selfishness means that the system will only perform actions that increase its own fitness [43], naturally selected systems will not only be selfish, since they try to optimize their own fitness, but they will also tend to avoid altruism. Thus, one's own fitness is indirectly reduced by altruistic behavior.
In our context, we can illustrate the cooperative behavior as seen in the following Fig. 12. In the frame (a) we have a LW 1 allocated by the leader to neutralize the threat K 4 , so here we have the LW 1 acting in an altruistic way for the benefit of the defense of the leader, also note the existence of another threat called K 1 in this frame. During the displacement towards the threat K 4 , seen in frame (b), the threat K 1 enters the selfish engagement radius (d s ), in this case the LW 1 identifies the existence of a threat within d s and starts behaving selfishly. In this case the LW 1 decides to temporarily overwrite the command given by the leader, and neutralize K 1 which presents a great risk to itself. Finally, we have the frame (c), demonstrating that once the threat K 1 has been destroyed, the LW 1 starts to follow its altruistic behavior again, where it will neutralize the threat commanded by the leader.
Therefore, as demonstrated, the developed decision-making can behave both altruistically and selfishly. The altruistic behavior being the leader's defense, even if it leads to his own destruction. And the selfish behavior, where he overwrites the leader's assignment, thus ceasing to cooperate, and starts to chase the neutralization of the highest risk threat, which is the closest to him. Once it neutralizes the highest risk threat, it goes back to cooperate with the system as a whole, seeking to neutralize the threat allocated by the leader.

V. DECISION-CONTROL INTERFACE
As we have two main layers of software, it is necessary to define a communication interface between such modules, i. e., how the decision commands are translated into control commands for the MAVs.
As defined in Fig. 7, the kamikaze MAVs have four behaviors: idle State, Attack Leader State, Attack Loyal Wingman State, and Attack Protected Area State. The HL-LL interface of the kamikazes translates these behaviors only into target position commands since the kamikaze behaviors does not demand attitude changes.
The idle State demands the kamikaze to stay stationary in its current position, and to do so the target position informed to the guidance algorithm must be equal to the kamikaze position, i.e.,ř j/g On the other hand, Attack Leader State, Attack Loyal Wingman State, and Attack Protected Area State can be executed by generating the target position aš where k is the ID of the desired target. As shown in Table 2, the LW has four movement behaviors: Chase Threat, Go To formation, Approach Formation, and Sacrifice Attack. VOLUME 11, 2023  To execute the Go To Formation, the HL-LL interface of the LW informs the target position to the guidance algorithm asř where i ∈ I m \ℓ andř i/l g ∈ R 3 is a target position relative to the leader given by the HL decision-making layer.
Similarly, to execute Approach Formation, the target position is calculated aš Since the formation is defined on the local horizontal plane, the LW approaches the formation by taking a vertical offset relative to its target position in formation for simplicity.
To execute the Chase Threat, the target position informed to the guidance algorithm of the LW must be set as an offset relative to the target kamikaze position, i.e., where i ∈ I m \ℓ, k ∈ I k is the ID of the target kamikaze, and d t ∈ R >0 is the distance to be kept from the target. The Sacrifice Attack is executed by setting d t = 0.

VI. COMPUTATIONAL ARCHITECTURE
This section describes the computational architecture used to implement the proposed scenario. A simulation framework is built as a proof of concept for the development of the proposed techniques. The infrastructure is divided into two PCs, as seen in Fig. 13. The first is responsible to compute the LL control, physics, 3D visualization, and interface with the human pilot through a joystick. The second one is responsible to run the HL decision-making modules of the agents being simulated. The computers communicate on a local network using the Robot Operating System 2 (ROS 2) middleware which is built on an open-source data distribution standard (DDS) middleware that provides features such as discovery, serialization, and transportation. 1 The LL control, physics, and interface with the human pilot through a joystick are coded in MATLAB. The 3D simulator was developed using the Unity engine to allow a proper visualization of the proposed scenario. On the other hand, the 1 ROS2 Documentation: https://www.ros.org decision-making algorithms were implemented in the Python programming language, due to its large community, support, and availability of AI libraries, especially for machine learning.
The data communication interface between the HL decision-making layer and the LL control layer is implemented using the framework provided by ROS 2 [45]. It is based on topics used to publish and receive customized messages in a publish-subscribe pattern that allows the layers to share important information asynchronously. The customized messages contain information regarding the action commands from the HL decision-making module and the status update of the vehicles calculated by the LL control and simulation sub-module, as described in Table 3. For ease of implementation, each vehicle is identified by a unique ID. The LL control and simulation sub-module also sends unidirectional information containing the pose of the vehicles via socket to the 3D simulator that works as a visual feedback to the pilot.

VII. SCENARIO SIMULATION AND RESULTS
In this section, we describe a series of simulations used to evaluate the overall method. Subsection VII-A shows the results of a formation flight simulation. Subsection VII-B presents the qualitative results of the HL decision making. Subsection VII-C presents a Monte Carlo simulation. Lastly, Subsection VII-D shows the computational burden of the overall simulation.
In all simulations, the MAVs are non-planar fully actuated octocopters, as the one depicted in Fig. 1, with a total mass  Moreover, the MAVs are subject to a sinusoidal force disturbance with the same amplitude of 0.6 N but with different phase shifts, and to a sinusoidal torque disturbance with the same amplitude of 0.05 Nm but also with different phase shifts.
The LL control and simulation, coded in MATLAB, use the Euler integration method and runs with a frequency of 100 Hz. The actuator dynamics are modeled by first-order differential equations with a time constant of 0.01 s. The force and torque coefficients of the rotors are 1.2838 × 10 −5 N/(rad/s) 2 and 3.0811 × 10 −7 Nm/(rad/s) 2 , respectively. It is worth mentioning that the adopted model is typically used in the MAV literature to describe the set consisting of a brushless motor, an electronic speed controller, and a propeller [46].

A. LOW-LEVEL CONTROL SIMULATION
To evaluate the LL control methodology, we simulate four LW flying in a circular-uniform formation around the remote-controlled leader in a scenario without kamikazes. Since the MAVs use the same flight control law, we evaluate the tracking performance of the LL control method by only plotting the tracking errors for the LW 1. The simulation of this flight can be seen in this video. 2 Fig. 14 shows the position and attitude tracking performance of the LW 1. It can be seen that the vehicle performs an accurate position and velocity tracking, being the chattering [47], which is a high-frequency oscillation around the sliding manifold caused by unmodeled phenomena, such as time discretization, actuator dynamics, and sensor noise, more evident in the velocity components. It can also be seen that the attitude variables have a chattering with a magnitude smaller than two degrees in each component. Moreover, note that at the time instant 5.6 s, the attitude 2 https://youtube.com/clip/UgkxHcMf1eOj chattering is amplified since at this instant the vehicle is called to formation, i.e., it starts receiving the leader attitude and angular velocity references that also present chattering.

B. HIGH-LEVEL DECISION MAKING QUALITATIVE RESULTS
In this section, we describe the qualitative results of a series of simulations conducted using the proposed computational architecture for 4 LW against 2 kamikazes. This simulations were used to test the effectiveness of the overall method, i.e., the effectiveness of the proposed LL control method and the LW BT architecture. The results described in this subsection can also be seen on this video. 3 Moreover, the simulations were used to empirically find an effective LW decision-making architecture, while developing the behaviors and tuning the decision-making parameters. Subsequently, we demonstrate operating sequences found during the execution of the scenario.
First, in Fig. 15, we see the simulation result of the case where a kamikaze threat is identified and neutralized. Initially, in frame (a) we see the MUM-T in formation and an identified kamikaze threat in red. The LW MAV closest to the threat leaves the formation and chases the kamikaze by executing the chase threat behavior. As a result, all remaining LW MAV members in formation will readjust to new positions defined by the leader, so no gaps for attacks are present.
Consequently, in frame (b), we have the LW that left the formation attacking the kamikaze with the freezing gun, this strategy aims to facilitate the approach for neutralization since this weapon considerably reduces the kamikaze's speed. Now in frame (c) the LW gets even closer to the frozen kamikaze and uses its vaporizer gun to effectively destroy it. Consequently, we see in frame (d) that the kamikaze is destroyed and the LW returns to formation, initially the LW  must approach the formation from a safe distance executing the approach formation and request re-entry for the leader, who must allow its entry and readjust the formation with the addition of this agent. In frame (e) we see the LW successfully re-entering the formation and sequentially executing the behavior go to formation. Finally, in frame (f) we see all the agents keeping formation after the success of the mission.
Furthermore, in Fig. 16, we observe two sequences of cooperative neutralization that emerged during the simulations, where two loyal wingmen performed tasks together to neutralize kamikazes. In the first sequence, frames (a)-(c), we have LW1 freezing the kamikaze from a longer distance, consequently, we observe LW2 in (b), which also identified the same threat, neutralizing it using the vaporizer gun. In the second sequence, frames (d)-(f), we see a similar case from a different view. In both cases, the participating LW1 and LW2 take complementary actions to eliminate a common threat. Now we present the sequence leading to the sacrifice of an LW, in Fig. 17. The case begins in frame (a) with three loyal wingmen in formation, after LW4 is destroyed. In frame (b), we observe the LW1, which only has one cartridge remaining in its weapons, engaging two kamikazes, note that after the neutralization of the first kamikaze, it collides with the other one present in the attack, thus eliminating two threats. In frame (d) we have LW1 out of ammo and starting its sacrifice attack after a new threat enters the area. In frame (e) we have the successful elimination of the kamikaze, and consequently, the leader readjusts the relative position in formation of LW2 and LW3. Finally, in frame (f) we have a new MUM-T formation composed only by the remaining LW2 and LW3.
Unexpected behaviors can emerge, as illustrated in Fig. 18, where we observe two kamikazes heading toward the leader, in frame (a). In the sequence, in frame (b), we have LW1 identifying and freezing the closest threat K1. As a consequence, we observe that K1 collides with K2, since K1  has been frozen and K2 which is closely following K1 cannot avoid the collision in time. In this way, we can highlight this emergent behavior as an indirect neutralization method.

C. MONTE CARLO SIMULATIONS
The effectiveness of the overall method is evaluated using a Monte Carlo simulation with 122 iterations. These simulations are performed considering 4 LW against 3 kamikazes (that respawn once neutralized) with the protected area located at the origin and the leader hovering at the point (5, 0, 5) m. The LW have a maximum speed of 1.5 m/s, being the kamikazes 50% faster. The freezing and vaporizer guns have an ammo of 10, a cooling down time of 1 s, and a hit probability of 95%. The freezing gun can reduce the kamikaze speed by half during 5 s. Fig. 19 shows the Monte Carlo simulation results. It can be seen that, on average, the survival time is 169.3 s and the number of kamikazes destroyed is 34, without considering the sacrifice attack behavior, which is an excellent result since the MUM-T can only directly neutralize a maximum of 40 kamikazes given that each LW is equipped with a vaporizer gun with an ammo of 10. This corresponds to 85% of the total neutralizing capability.

D. COMPUTATIONAL TIME
To evaluate the computational burden of the overall simulation, we conducted a typical run using the same parameters adopted in Subsection VII.C. For this run, both the LL and the HL modules for all the MAVs were implemented in the same computer, which is a laptop equipped with a 12th generation VOLUME 11, 2023  Intel Core i7-12700H, with 2.30 GHz and 16 GB of RAM. The total simulation time, from the beginning to the leader destruction, was about 131 s. In only 3.4% of a total of 13100 time steps, the sampling time of 0.01 s was exceeded.
Further, similar to the computational analysis done in [48], Fig. 20 shows a plot of the iteration time over the simulation time, together with the respective average and the adopted sampling time. From this plot, we can see that, in fact, in most of the iterations, the adopted sampling time is not violated. It is worth mentioning that, these results are satisfactory since real time is reasonably achieved even using a code that is not optimized for embedded implementation.

VIII. CONCLUSION
This paper evaluated the concept of CEC applied to a MUM-T. To this end, we proposed a defensive scenario where loyal wingman MAVs cooperate to defend a manned leader and protect a critical infrastructure from a swarm of kamikazes with explosive capabilities. To deal with such threats the loyal wingmen have the capability to engage and neutralize the threats, using two idealized weapons, a vaporizer gun representing a short-range weapon capable of neutralizing a threat with a single shot, and a freezing gun, a non-lethal weapon of mid-range, capable of slowing down the threat. To reduce the complexity of this problem, all the vehicles are modeled as fully actuated MAVs, and the problem is broken down into two parts, one involving an LL control layer, and another involving an HL layer of intelligent software capable of making autonomous and decentralized decisions. The LL control designs a joint attitude-position control law using a first-order sliding mode control that guarantees the closed-loop system robustness with respect to model uncertainties and disturbances. On the other hand, the decision-making for loyal wingman agents was developed using the behavior tree technique and reduced to modular behaviors, such as: chasing a threat, remaining in protective formation, approaching formation to initiate rejoin procedure, and attack behaviors using the available weapons or using the sacrifice attack in the extreme case where there are no more weapons available. The method is analyzed using a high-fidelity 3D simulator and shown to be effective. In future works, we plan to increase the complexity of the kamikazes' behaviors in order to improve the difficulty of the proposed scenario and investigate different formation patterns. We also plan the development of consensus algorithms using a world model that is not synchronized among the MUM-T agents. Moreover, more complex scenarios can be defined involving different types of threats and drones.
This paper reported a low technology readiness level (TRL) researh project, which is expected to move to a higher TRL in future works. Moreover, in an immediate future phase, this project will focus on state estimation (navigation and tracking) as well as on distributed hardware-in-the-loop simulation. He is currently a Professor at ITA, where he is also a member of the Autonomous Computational Systems Laboratory (LAB-SCA) and leads the robotics competition team-ITAndroids. His research interests include humanoid robotics, mobile robotics, dynamical systems control, and artificial intelligence.