Multi-UAV Collaborative Trajectory Optimization for Asynchronous 3-D Passive Multitarget Tracking

This article considers the 3-D collaborative trajectory optimization (CTO) of multiple unmanned aerial vehicles to improve multitarget tracking performance with an asynchronous angle of arrival measurements. The predicted conditional Cramér–Rao lower bound is adopted as a performance measure to predict and subsequently control tracking error online. Then, the CTO problem is cast as a time-varying nonconvex problem subjected to constraints arising from dynamic and security (height, collision, and obstacle/target/threat avoidance). Finally, a comprehensive solution method (CSM) is presented to tackle the resulting problem, according to its unique structures. Specifically, if all security constraints are inactive, the CTO can be simplified as a nonconvex problem with convex dynamic constraints, which can be solved by the nonmonotone spectral projected gradient (NSPG) method. Oppositely, an alternating direction penalty method (ADPM) is presented to solve the CTO problem with some positive security constraints. The ADPM introduces auxiliary vectors to decouple the complex constraints and separates the CTO into several subproblems and tackles them alternately, while locally adjusting the penalty factor at each iteration. We show the subproblem w.r.t. the position vector is nonconvex but with convex constraints, which can be efficiently solved by the NSPG method. The subproblems w.r.t. the auxiliary vectors are separable and have closed-form solutions. Simulation results demonstrate that the CSM outperforms the unoptimized method in terms of tracking performance. Besides, the CSM achieves the near-optimal performance provided by the genetic algorithm with much lower computational complexity.


I. INTRODUCTION
A. Motivation and Related Work T RAJECTORY optimization (TO) aims to predetermine the waypoints of unmanned aerial vehicles (UAVs) to achieve various tasks, such as minimizing fuel/battery consumption [1], maximizing search area coverage [2], or improving localization/tracking performance [3], [4], and so on. Passive tracking of radio emitters/targets with UAVs is among these tasks, which has wide applications in military and civilian fields, including target tracking in electronic reconnaissance and mobile user tracking in a wireless communication network [5] and so on. The sensing capabilities of UAVs depend not only on the measurement quality of payload passive sensors but also on the geometry of multi-UAV [6]. Therefore, one of the main research challenges is how to automatically optimize the trajectories of UAVs online to maximize tracking accuracy while simultaneously meeting the constraints deriving from speed, turning angle, height, collision, and obstacle avoidance.
To tackle this challenge, several TO models and corresponding solution algorithms are proposed for target localization/tracking in [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. Tzoreff and Weiss [7] discuss online and offline TO problems for a single UAV in the presence of time of arrival (TOA) measurements, subject to speed and no-fly zone constraints. The proximal policy method and the policy rollout algorithm are applied to TO problems to speed up the localization of an emitter with angle of arrival (AOA) measurements in [8] and [9], respectively. These TO schemes for single-UAV [7], [8], [9] do not require consideration of collision avoidance and communication distance constraints, whereas these constraints should be considered in multi-UAV collaborative TO (CTO) problems [10], [11], [12], [13], [14]. A receiver TO problem subjected to the minimal distance allowed to the emitter is considered in [10] to improve the localization performance, which is solved by the projected gradient (PG) method. Then, Dogancay [10] extends their work to a multitarget localization environment [11]. In [12], a steering control approach is proposed for a pair of UAVs in the time difference of arrival (TDOA)-based localization of a single emitter. In [13], a noncausal TO scheme is addressed for the received signal strength (RSS)-based localization to obtain This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ a higher final localization accuracy. A TO problem for multiple UAVs with heterogeneous payload sensors is considered in [14] to maximize the localization performance of an emitter while avoiding threats and ensuring effective communication between UAVs. However, the previous works [10], [11], [12], [13], [14] only optimize the steering of multi-UAV and do not take full advantage of the degree of freedom concerning speed.
An offline 2-D TO problem for a single-emitter location utilizing two moving AOA sensors is considered in [15] to improve the localization performance. In this article, the TO is formulated as a semidefinite programming problem with several constraints rising from dynamic, collision, and obstacle limitations, which is tackled by the alternating direction method of multipliers [27] (ADMM). However, this work [15] assumes a constant covariance of the measurement noise, but it decreases as the UAV approaches the target in practice. Besides, the convergence of ADMM for solving nonconvex problems is related to initial points and penalty factors, tuning of these parameters is needed for a good performance. The adjustment of these parameters relies on experience and is scenario-dependent.
Different from localization tasks, a prediction dynamic model should be considered in tracking applications, which would expand the uncertainty in the state estimation over time due to the target motion. Hernandez [16] addresses an unconstrained TO problem for AOA sensors mounted on multiple UAV platforms and proposes a quadrant search method that iteratively restricts the search to the most promising quadrant. In [17], [18], [19], [20], several joint TO and resource allocation problems are investigated to improve tracking accuracy, which is solved by the cyclic minimization-based methods and genetic algorithm (GA). A CTO problem is considered in [21] to track a moving emitter using RSS sensors, which is solved by the model predictive method. In [22], a centralized steering optimization scheme is presented for a pair of UAVs to improve tracking accuracy. For the same purpose, a heuristic planner is proposed in [23] to guide the waypoints of multi-UAV while avoiding no-fly zones. However, collision and obstacle avoidance is not fully investigated in [16], [17], [18], [19], [20], [21], [22], [23]. The TO problem is formulated in a partially observable Markov decision process framework in [24], where the goal is to improve tracking performance while achieving threat and collision avoidance. A CTO problem is addressed in [25] to minimize the overall tracking error of multiple emitters while avoiding no-fly zones. This work assumes that each UAV simultaneously intercepts signals from multiple emitters. However, the transmit signals are usually asynchronous across multiple emitters in practice.
Most of the previous works [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25] focus on 2-D TO. On the one hand, the 2-D TO ignores the possibility of exploiting the altitude dimension to improve tracking performance. On the other hand, the increased dimensions of variables and the intercoupling of multiple nonconvex constraints make the online 3-D CTO problem more challenging. A 3-D CTO problem subject to communication range and no-fly zone constraints is considered in [26] for target tracking. In this work [26], the 3-D trajectories are decomposed into two 2-D trajectories in the horizontal and vertical directions and optimized by gradient-descent and grid search methods, respectively. The premise of this decomposition technique is that the velocity constraints in the horizontal and vertical directions are not coupled. This assumption may not hold for some types of UAVs, such as fixed-wing UAVs. In addition, the proposed model in [26] is not suitable for multiple target tracking (MTT) problems.

B. Main Contributions
This article considers a 3-D CTO for multi-UAV armed with AOA sensors 1 in asynchronous passive MTT environments. The major contributions are threefold: 1) A closed-loop asynchronous tracking framework is built for CTO. In this framework, the maximum likelihood (ML) estimator [28] is applied to obtain the composite measurement (CM) from the current AOA measurements spanning the fusion time interval. Then, the CMs are processed by the Kalman filter (KF) to estimate and predict the states of multiple targets [29]. According to the prediction information, we calculate the predicted conditional Cramér-Rao lower bound (PC-CRLB) and adopt it as a performance measure to precontrol the tracking error online [30]. Finally, the waypoints are optimized in the fusion center and then fed back to multiple UAVs 2) To improve the overall MTT performance with asynchronous AOA measurements, we model the 3-D CTO as a time-varying nonconvex problem subjected to dynamic and security constraints. Since the arrival time of measurements from different targets w.r.t. each sensor is random, existing researches [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26] cannot be applied to the 3-D CTO problem in the passive MTT context. We introduce the fusion time interval for time alignment and implement CTO in each fusion time instant. The dynamic constraints deriving from the speed and turning angle are represented as convex sets. The time-varying security constraints arising from the height, collision, and obstacle avoidance may not work at some time instants, leading to the formulation of the 3-D CTO as a time-varying nonconvex problem, since the PC-CRLB is nonconvex w.r.t. waypoints of multi-UAV. 3) A fast comprehensive solution method (CSM) incorporates the nonmonotone spectral PG (NSPG) method and the alternating direction penalty method (ADPM) is proposed to solve the CTO problem, by exploring its timevarying characteristic. The CSM begins with a judgment of which security constraints are active. If all security constraints are inactive, the CTO can be reduced to a nonconvex optimization problem with convex dynamic constraints, which can be efficiently solved by the NSPG method [31] with the proposed closed-form projection operator. In the case that some security constraints may be activated during the CTO process, we introduce some auxiliary vectors for the positive security constraints to decouple them. Then, we reformulate the CTO as an equality-constrained nonconvex optimization problem, which can be handled by the ADPM framework [27]. The ADPM separates the CTO into several subproblems, which can be tackled by the NSPG method or have closed-form solutions.
The remainder of this article is organized as follows. The system description is given in Section II. A two-step method for passive MTT with asynchronous AOA measurements is introduced in Section III. The formulation of the CTO scheme and the corresponding solution technique are presented in Section IV. Several simulation results are analyzed in Section V to demonstrate the effectiveness of the CSM. Finally, Section VI concludes this article.

II. SYSTEM DESCRIPTION
A potential scenario of multi-UAV multitarget tracking is shown in Fig. 1, where Q widely separated point targets are considered to be tracked by N UAVs. Each UAV is equipped with a passive sensor that can intercept the signal emitted by multiple targets and provide AOA measurements. Note that the signal launch time of multitarget is different, leading to the measurement arrival time for different targets w.r.t. each sensor is asynchronous, details are presented in Section II-C. To collaboratively track multiple targets, a fusion time interval T 0 is specified for simultaneous multitarget tracking in CTO. At the fusion time instant t k (t k = t k−1 + T 0 ), the position x p q,k = (x q,k , y q,k , z q,k ) T and the velocity x v q,k = (ẋ q,k ,ẏ q,k ,ż q,k ) T of target q ∈ {1, 2, . . . , Q} are unknown parameters to be estimated, then x q,k = (x q,k ,ẋ q,k , y q,k ,ẏ q,k , z q,k ,ż q,k ) T is defined as the state of target q. The position and velocity of UAV n ∈ {1, 2, . . . , N } are p n,k = (x n,k , y n,k , z n,k ) T and v n,k = (ẋ n,k ,ẏ n,k ,ż n,k ) T , respectively.

A. Dynamic Model for Target q
The dynamic of target q is described as the nearly constant velocity model [32] x q,k = F k−1 x q,k−1 + u q,k−1 (1) where F k−1 is the state transition matrix, and u q,k−1 is a zero-mean Gaussian process noise with covariance Q q,k−1 [32] where I 3 is the identity matrix of order 3, ⊗ is the Kronecker product, and τ q denotes the process noise intensity [33].

B. Dynamic Model for UAV n
As is shown in Fig. 1, we consider UAV n flies in 3-D Cartesian coordinates. Since the fusion time interval is small generally, the dynamic of UAV n can be described as a constant velocity model during the fusion time interval (t k−1 , t k ) where V n,k = {v n,k | ∥v n,k ∥ 2 ≤ v n max , ∡(v n,k , v n,k−1 ) ≤ α n max } implies the velocity of UAV n is restricted by the maximum speed v n max and the maximum turning angle α n max , and ∡(v n,k , v n,k−1 ) denotes the angle formed by vectors v n,k and v n,k−1 . From (4), we see the position of UAV n at time t k is decided by the velocity v n,k for the given initial position p n,k−1 . v n,k can be adjusted at time t k−1 and maintain a constant during the fusion time interval (t k−1 , t k ). Limited by dynamic constraints, the possible positions of UAV n at time instant t k are collected in the set D n,k , given by C. Measurement Model for Target q w.r.t. UAV n During the kth fusion time interval (t k−1 , t k ), each sensor may receive multiple measurements from different targets, as shown in Fig. 2. We define M n,q,k as the number of measurements received by sensor n from target q and refer to t n,q,k (m) as the arrival time of the mth measurement from target q w.r.t. sensor n. The number of measurements M n,q,k obeys a Poisson process with an exponential interarrival time [34]. It is assumed that the sensor network has been synchronized. In this case, the arrival time of AOA measurements for target q w.r.t. different sensors are the same, i.e., t 1,q,k (m) = t 2,q,k (m) = · · · = t N ,q,k (m) = t q,k (m) and M 1,q,k = M 2,q,k = · · · = M N ,q,k = M q,k . The arrival time interval between two successive AOA measurements from target q received by sensor n may not be equal. During the kth fusion time interval, the mth AOA measurement from target q received by sensor n is where h(x where x p q,k (m) = (x m q,k , y m q,k , z m q,k ) T is the position of target q at time t q,k (m), p n,k (m) = (x m n,k , y m n,k , z m n,k ) T is the position of UAV n, d m n,q,k = ((x m q,k − x m n,k ) 2 + (y m q,k − y m n,k ) 2 ) 1/2 is the distance from the qth target to UAV n in x y plane, θ m n,q,k ∈ [0, 2π ] and ϕ m n,q,k ∈ [0, π] are azimuth and elevation, respectively. In (6), w m n,q,k is the measurement noise that follows a zero-mean Gaussian distribution with covariance where ∝ is a proportional operator, κ m n,q,k is the radar cross section (RCS) of target q w.r.t. UAV n. θ 3 dB and ϕ 3 dB are the 3 dB receive beamwidth in azimuth and elevation, respectively [36]. R m n,q,k = ∥x p q,k (m) − p n,k (m)∥ 2 represents the distance from the qth target to the nth UAV. It can be observed that the measurement error will decrease as the UAV approaches the target.

III. MTT WITH ASYNCHRONOUS MEASUREMENTS
We assume that multiple targets are widely separated in the surveillance area. In this scenario, the MTT task can be divided into several independent single-target tracking problems. A two-step method is applied for target tracking with asynchronous AOA measurements. First, the ML method is employed to construct CMs for multiple targets at each fusion time instant [28]. Then, we adopt the KF for state estimation and prediction [37].

A. Formation of CM for Target q
The CM for target q at the fusion time instant t k is x q,k , which can be estimated by maximizing the likelihood function of x q,k , based on the AOA measurements span the fusion time The probability density function (pdf) of Z q,k conditioned on x q,k is p Z q,k | x q,k , then the ML estimator x q,k is where From (10) and (11), we convert the ML problem as a weighted least square problem. It is hard for us to obtain a closed-form solution to problem (10) since Z q,k | x q,k is a nonlinear function. Consequently, the ML estimate can be obtained by the Newton iteration method [38], the state estimate for target q after the lth iteration is where the nonlinear measurement function can be represented as follows: In (13), h F(t q,k (m), t k )x l q,k , p n,k (m) predictes the measurement from the fusion time t k to the arrival time t q,k (m), where F(t q,k (m), t k ) denotes the transition matrix In (12), q,k = blkdiag( 1 1,q,k , . . . , m n,q,k , . . . , M q,k N ,q,k ) denotes the covariance matrix of the measurement noise, H l q,k is the Jacobian matrix at the lth iteration where H l n,q,k (m) corresponding to the mth measurement evaluated at x l q,k where ∇ denotes the partial derivative operator, and x l q,k (m) = F(t q,k (m), t k )x l q,k predicts the target state from the fusion time t k to the arrival time t q,k (m). An initial state estimate for the qth target is required for the Newton iteration method, which can take the predicted state, since the fusion time interval is small generally, e.g., two seconds as considered in this work.
The CRLB provides a lower bound for any unbiased estimator. Given the true state x q,k and the state estimate x q,k , the CRLB can be represented as where CB q,k is the Fisher information matrix (FIM) The FIM requires knowledge of the true state of the target, which is unknown in practice. Therefore, the FIM is approximated by replacing the true state with the CM produced by the ML estimator. It is shown that the ML estimator is statistically efficient for as few as two AOA measurements [28] (variety with UAV−target geometries). Therefore, the CRLB matrix can be used as the measurement-noise covariance for the resulting CM x q,k in the KF.

B. The KF for Target q
Suppose that the track initialization is available, and thus, we can obtain the initial state estimatex q,k=0 and the corresponding initial covarianceP q,k=0 for target q. It is assumed that the pdf of x q,k follows a Gaussian distribution, i.e., p(x q,k |x q,k ) = N (x q,k ,P q,k ), wherex q,k andP q,k are the state estimate and the corresponding covariance, respectively. From (1), the predicted pdf is p(x q,k |x q,k−1 ) = N (x q,k|k−1 , P q,k|k−1 ), and the predicted state x q,k|k−1 and corresponding covariance P q,k|k−1 are [37] x If no measurements are received from the qth target during the kth fusion time interval, the KF regards the predicted state and the corresponding covariance as the state estimate and the estimated covariance, respectively. On the contrary, the innovation and the corresponding covariance can be obtained from the predicted state and the CM as In (21), x q,k|k−1 = F k−1xq,k−1 is the predicted state. The state estimatex q,k and the corresponding covarianceP q,k can be obtained byx where G q,k = P q,k|k−1 S −1 q,k is the Kalman gain [37].

IV. CTO FOR PASSIVE MTT
This article aims to collaboratively optimize the trajectories of multi-UAV to maximize the MTT performance. The closed-loop asynchronous tracking framework is shown in Fig. 3, where the two-step tracking method is utilized for state estimation and prediction. According to the prediction information, the PC-CRLB of each target can be calculated, then the summation of these PC-CRLBs is used as the objective function of the CTO problem. The dynamic and security constraints can be specified according to the UAV's performance parameters and application scenarios. Finally, the CSM is proposed for the resulting CTO problem to guide the waypoints of multi-UAV next time. We first formulate the CTO model and then introduce the CSM.

A. PC-CRLB for Target q
The PC-CRLB is adopted as the tracking performance measure to predetermine the waypoints of multi-UAV at the next fusion time, which is defined as [30] where J(x q,k ) is the predicted conditional FIM (PC-FIM), whose inverse is the PC-CRLB. The PC-FIM is In (26), p x q,k , Z q,k |Z q,1:k−1 is the joint pdf conditioned on the set Z q,1:k−1 that collects measurements from time t 1 to t k−1 , and x q,k x q,k denotes a second-order partial derivative operator w.r.t. x q,k . The PC-FIM can be simplified as the summation of two terms as follows: In (27), J P (x q,k |Z q,1:k−1 ) is the prior information term where the expectation is taken w.r.t. the predicted pdf.
It is hard to analytically evaluate the J P x q,k |Z q,1:k−1 .
Since the predicted pdf of the target q obeys a Gaussian distribution, we can approximate J P x q,k |Z q,1:k−1 as the inverse of the predicted covariance matrix, i.e., J P x q,k |Z q,1:k−1 ≈ P −1 k|k−1 . In (27), J D x q,k |Z q,1:k−1 is the data information term, which can be expressed as follows: Similarly, the evaluation of J D (x q,k |Z q,1:k−1 ) involves the expectation operation, which is usually implemented by the Monte Carlo experiments. Under the assumption that the process noise is relatively small, we can approximate J D (x q,k |Z q,1:k−1 ) as [39] J D x q,k |Z q,1:k−1 The arrival time of measurements w.r.t. target q span randomly in the fusion time interval (t k−1 , t k ), and we cannot predict them at the time instant t k−1 . Therefore, we calculate J D (x q,k |Z q,1:k−1 ) according to the predicted state from the fusion time instant t k−1 to t k . In (30), n,q,k (p n,k ) is the predicted covariance of the measurement noise [see (8) for details].H n,q,k (p n,k ) means that the predicted Jacobian matrix is a function of the UAV's position where components of the partial derivatives are In (32), R n,q,k|k−1 = ∥p n,k −x p q,k|k−1 ∥ 2 is the predicted distance from UAV n to target q, d n,q,k|k−1 = ((x q,k|k−1 − x n,k ) 2 + (y q,k|k−1 − y n,k ) 2 ) 1/2 denotes the predicted distance in xy plane.
Based on the above-mentioned derivation, we can rewrite the PC-FIM as follows:

B. CTO Model
The PC-FIM (33) shows that the positions of multi-UAV have an impact on the tracking performance, which motivates us to optimize the trajectories of multiple UAVs to achieve the optimal MTT performance. Since the elements of the PC-CRLB matrix have different units, it cannot be adopted as an objective function directly, we perform a trace operation on the normalized PC-CRLB as follows: where the position vector of multi-UAV is defined as p k = (p T 1,k , p T 2,k , . . . , p T N ,k ) T , and = I 3 ⊗ blkdiag(1, T 0 ) is the normalization matrix.
In general, the trajectories of UAVs are subject to some dynamic and security constraints. The dynamic constraints are given in (5), the first type of security constraint is that the flying altitude of each UAV should be greater than or equal to the minimum height H min where p n,k (3) denotes the third element of p n,k , and a,k is a set that collects all indexes of UAVs that may fly below the minimum height at the next time step. The second type of security constraint is that any two UAVs should be separated by a certain distance d col , that is, for all (n 1 , n 2 ) ∈ s,k and n 2 > n 1 ∥p n 1 ,k − p n 2 ,k ∥ 2 ≥ d col , (n 1 , n 2 ) ∈ s,k where s,k is a set that collects all index pairs of any two UAVs which may collide. In other words, if two UAVs are unlikely to appear in their respective range of motion during the next fusion time interval, the constraint (36) will not work.
The third kind of security constraint is that each UAV should keep a minimum clearance distance r i from a given position These constraints (37) can be configured to achieve obstacle/target/threat avoidance by appropriately selecting c i,k and r i , i ∈ {1, 2, . . . , N B + N T + Q}, where N B and N T denote the number of obstacle points and threats, respectively. The obstacle points are given by a 3-D digital terrain map [40], which is reloaded at each fusion time instant. The positions of threats (e.g., enemy radars) are assumed known. If c i,k = x p q,k|k−1 , it means that the UAV should keep a distance from target q to avoid being detected by it. Similarly, if the UAV n is far away from the obstacle i, i.e., (n, i) / ∈ b,k , this constraint will be inactive. The constraints (37) may also be modified to define arbitrary noncircular no-fly zones.
Combining (5) and (IV-B)-(37), we formulate the time-varying CTO model as where f (p k ) = Q q=1 f q (p k ) denotes the overall MTT accuracy. Note that the security constraints (36) and (37) are nonconvex sets and the objective is a nonconvex function, and the CTO model (38) is a time-varying nonconvex optimization problem, which is hard to obtain its optimal solution. In the following, the CSM is proposed to find the suboptimal trajectories of multiple UAVs in different situations.
The following remarks may be deduced from the CTO model (38).
Remark 1: We note that the minimum flying height constraint in (38) may be redundant when specific obstacle avoidance constraints are enforced. For instance, the height constraint becomes redundant when clearance distances for obstacle point avoidance are much larger than the minimum flying height, i.e., r i ≫ H min . On the other hand, if UAVs are far from obstacle points and require keeping highaltitude flying, then the minimum flying height constraint is necessary.
Remark 2: In [10], [11], [12], [13], [14], the heading angle and speed are adopted as optimization variables, and the dynamic model for UAV n is reformulated as p n,k = p n,k−1 + v n,k T 0 ς n,k , where v n,k ≤ v n max is the speed of UAV n, ς n,k = [cos(β n,k ) cos(φ n,k ), sin(β n,k ) cos(φ n,k ), sin(φ n,k )] T is the direction vector of velocity. β n,k and φ n,k are azimuth and elevation of the velocity, respectively, which satisfies ∡(ς n,k , ς n,k−1 ) ≤ α n max . The adaptable variables in the new dynamic model are {v n,k , β n,k , φ n,k }. We see that the new formulation is equivalent to model (4), but it introduces transcendental functions, which increases the nonlinearity of security constraints. Therefore, the dynamic and CTO models

Algorithm 1 The NSPG Method
Input: l = 0, λ l k , p l k , ε, λ min and λ max ; repeat 1. Calculate the search direction: d l k = P D k (p l k − λ l k ∇ p l k f (p k )) − p l k ; 2. Calculate the step length γ l k according to Algorithm 2 and set p l+1 k = p l k + γ l k d l k 3. Update the spectral step length: we built are simpler than the previous studies [10], [11], [12], [13], [14].

C. CSM for the CTO Problem
In this section, the CSM is designed to solve the nonconvex problem (37) by exploring its time-varying property. The CSM begins with a judgment of which security constraints are active based on the current position and velocity of the UAV as well as the target. According to whether the security constraints work, we design two algorithms to obtain the suboptimal trajectories of multi-UAV. Then, the CSM is presented based on the two algorithms.
Case 1 (CTO Without Security Constraints): If any two UAVs are far apart at an altitude higher than H min , and the UAVs are far away from multiple targets, threats, and obstacles, the security constraints may not work. In this case, the CTO model can be recast as which is a nonconvex optimization problem with convex constraints, since D n,k is the intersection of a ball and a cone. The PG method [10] can be applied to solve problem (39) for a suboptimal solution, but with a slow convergence rate. To this end, we introduce the spectral gradient idea with a nonmonotone line search (NLS) to the PG method to speed up its convergence [31]. The NSPG for problem (39) is given in Algorithm 1, where ε is the tolerance for the NSPG method, λ l k denotes the spectral step length at the lth iteration, [λ min , λ max ] are the minimum and maximum safeguarding parameters for λ l k , D k = D 1,k ∪ D 2,k ∪ · · · ∪ D N ,k , and γ l k represents the step length that can be obtained by the NLS as listed in Algorithm 2.

Algorithm 2 The NLS Method
Input: d l k , p l k , L, γ l k = 1, δ, σ 1 and σ 2 ; Output: γ l k repeat 1. Calculate f max = max 0≤i≤min{l,L−1} f (p l−i k ) 2. Update the step length: parameters that satisfy 0 < σ 1 < σ 2 < 1. The main idea behind the NLS is that the steepest descent method is very slow but it can be accelerated by taking a step size that comes from the 1-D minimization at the previous step, instead of the one that comes from the minimization of the function along the gradient of the current iteration.
In Algorithm 1, P D k (p which can be easily solved by the interior point solver [41] but with a slow convergence rate and poor scalability. Instead, we propose a projection operator with a closed-form solution, which can be implemented parallelly. In particular, we can separate problem (40) into N subproblems, as the feasible domains of any two UAVs (n 1 ̸ = n 2 ) satisfy D n1,k ∩D n2,k = ∅, the nth subproblem is min p n,k ∥p n,k − p temp n,k ∥ 2 s.t. p n,k ∈ D n,k .
The optimal solution for problem (41) is P D n,k (p temp n,k ). The projection process for the given point p temp n,k involves the coordinate system transformation, as shown in Fig. 4. We first pan the origin coordinate system to the point p n,k−1 , and then rotate the z-axis to the v n,k−1 direction. The rotation step includes rotating β n,k−1 degrees around the z-axis and φ n,k−1 degrees around the y-axis. The rotation matrix is defined as and are rotation matrices w.r.t. y-axis and z-axis, respectively. After coordinate transformation, the position of p temp n,k is given by Recalling that the feasible domain D n,k is the intersection of a ball and a cone. We define OA (the generatrix of the cone in Fig. 4) to be the line segment where the point p ′ temp n,k lies, the coordinate of point A is p A n,k =v n max T 0 sin α n max cos(ψ n,k ), sin α n max sin(ψ n,k ), cos α n where η n,k = (p A n,k ) T p ′ temp n,k /∥p A n,k ∥ 2 . According to (44)-(46), we can obtain a closed-form expression for the projected point.
Case 2 (CTO With Security Constraints): In this case, some security constraints may be active during the tracking process, assuming that all three kinds of security constraints are activated, i.e., a,k , s,k , and b,k are nonempty sets in problem (38). Note (38) is hard to deal with, since it may entail a number of coupled nonconvex constraints. In light of the challenge of computation, it is desirable to develop an efficient algorithm to obtain high-quality suboptimal solutions to problem (38). To this end, an ADPM is developed to solve it with guarantee convergence under several conditions.
To present the ADPM, we first define a n , k = p n,k (3), n ∈ a,k and denote a k as a vector concatenating all a n , k , n ∈ a,k . Based on (47), we can compactly rewrite (35) as where A 1 is a coefficient matrix, which can be obtained from (47), the feasible domain of a k is D a,k = {a n,k | a n , k ≥ H min , n ∈ a,k }.
For the second type of security constraint, we define and let s k be a vector that concatenates all the s n 1,2 ,k , (n1, n2) ∈ s,k . A compact form of (49) is where A 2 is a coefficient matrix, which can be obtained from (49). From (36), the feasible domain of s k is D s,k = {s n 1,2 ,k | ∥s where b k is a vector that concatenates all the b ni,k , (n, i) ∈ b,k with the feasible domain D b,k = {b ni,k | ∥b ni,k ∥ 2 ≥ r i , (n, i) ∈ b,k }. Then, we can equivalently rewrite (37) as an equality constraint as where A 3 is a coefficient matrix, and c k is a vector that concatenates all the c i,k , (·, i) ∈ b,k . Then, we can equivalently reformulate (38) as The scaled augmented Lagrangian function of (53) is where χ k , µ k , and ν k are scaled Lagrange multipliers, and ρ 1 , ρ 2 , ρ 3 > 0 denote penalty parameters. The formulation of (54) plays a fundamental role in the scaled ADPM [27], we denote p l k , a l k , s l k , b l k , χ l k , µ l k , ν l k , ρ l 1 , ρ l 2 , and ρ l 3 as estimates of parameters at the lth iteration, and the iteration steps of the ADPM are given as follows.
Step 1: Update a k by solving the following problem: which can be split into N a,k = | a,k | subproblems, and be solved parallelly, where | a,k | is defined as the size of a,k . The ith (i = 1, . . . , N a,k ) subproblem is where a,k (i) denotes the ith element of the set a,k , χ l i,k is the ith element of the vector χ l k .
The closed-form solution of problem (56) at the lth iteration is Step 2: Update s k by solving problem (58) Note that problem (58) can be decomposed into N s,k = | s,k | subproblems, which can be solved parallelly. The ith where ( j1, j2) = s,k (i), s i,k is the ith block of the vector s k . The closed-form solution of (59) is (60) Step 3: Update b k by solving the following problem: Similarly, we divide problem (61) into N b,k = | b,k | subproblems and solve them parallelly. The ith (i = 1, . . . , N b,k ) subproblem is where ( j1, j2) ∈ b,k , and b i,k are ith block of b k . The closed-form solution to problem (62) is given by Step 4: Update p k by solving problem (64) s.t. p n,k ∈ D n,k , n = 1, 2, . . . , N .
Although (64) is a nonconvex problem [42], the variable p k = [p T 1,k , p T 2,k , . . . , p T N ,k ] can be divided into N blocks, each one with a convex constraint. Therefore, we can solve problem (64) with Algorithm 1 for a stationary point at the lth iteration.
If ρ l+1 1 = ρ l 1 , ρ l+1 2 = ρ l 2 , and ρ l+1 3 = ρ l 3 , the ADPM falls into the classic ADMM [27]. The idea behind the penalty and dual-parameters update is to try to keep the primal residual norms converging to zero. Specifically, if △r l a does not decrease with iterations, a larger ρ l+1 1 is utilized to make △r l a approaches zero to find a feasible solution. This operation not only improves the convergence of the classic ADMM but also makes the ADPM less dependent on the initial point [43]. As far as we know, the global convergence of the ADPM for general nonconvex problems is still an open problem [44]. On the other hand, our numerical results suggest that the ADPM can converge from any starting point. Toward understanding the numerical behavior, we provide the following weak converge result for ADPM, where the bounded of multipliers and the diminishing of ∥p l+1 k − p l k ∥ 2 are always observed numerically.
Proof: See the Appendix. Repeat steps 1-6 until some terminal conditions are satisfied, e.g., a maximum iteration number. We summarize the iteration steps of the scaled ADPM in Algorithm 3. Note that all kinds of security constraints are considered in Algorithm 3, which is the most complex case. In practice, we need to first determine which security constraints work, and then use Algorithm 3 to solve problem (53) after removing the inactive constraints, which can improve the efficiency of the algorithm. Combining the two cases mentioned earlier, we give the timevarying CSM, as shown in Algorithm 4.

V. SIMULATION RESULTS
This section presents simulation results to evaluate the performance of the proposed CSM in different scenarios.   respectively. The initial states for multiple targets in the KF and dynamic parameters are shown in Table I. For simplicity, we set the initial velocity of each UAV to [−15, 25, 0] T m/s, the remaining parameters of multi-UAV mounted on AOA sensors are given in Table II. We assume the RCS of target q w.r.t. UAV n is κ n,q,k = 1 m 2 , ∀n, q. In this case, the signalto-noise ratio (SNR) is only related to the geometry factor. The total flight time is 120 s for all experiments.

A. Effectiveness of the Proposed CSM
In this section, numerical results are presented to demonstrate the MTT performance of the CSM in comparison with the conventional unoptimized method and the GA algorithm. The GA imitates natural selection and survival of the fittest, which is applied to find the optimal solution to the CTO problem in [20]. Since the GA is categorized as a global search method, it can be used to check the effectiveness of the CSM.
We first compare the passive MTT performance of these methods. Following the steps of the CSM listed in Algorithm 4, the root of objective function values of problem (38) corresponding to three methods are shown in Fig. 6. It is observed that the proposed CSM can achieve considerable MTT performance improvement over the conventional unoptimized method. In particular, the performance improvement is increasing as the frame elapses, and it reaches more than 50% in the last few frames. This is because both angular separation and targets' SNRs improve as the multi-UAV move toward multiple targets. Besides, the MTT performance of CSM and GA are nearly the same. Since there are no theoretical results on the iteration complexity of the ADPM, NSPG, and GA for the general nonconvex problem, as an alternative, we compare the runtime of the CSM and GA. The runtime comparison is carried out on a 2.9-GHz Intel Core i7-10700 processor with 32-GB RAM and averaged on the total number of frames in MATLAB 9.8.0 (2020a). The average runtime of the CSM is about 0.05 s, while that of the GA is 52.4 s. The GA costs the most time to find the optimal solution, due to its heuristic property. Combining the results in Fig. 6, we can conclude that the CSM achieves near-optimal MTT performance with much lower complexity than the GA.

B. Passive MTT Performance of the Two-Step Method
In this section, we test the effectiveness of the two-step tracking method. The root-mean-square error (RMSE) for target q is defined as follows:  where N mc = 100 represents the number of Monte Carlo trials.
x err q,k =x m q,k − x q,k denotes the estimation error, andx m q,k is the state estimate at the mth trial. The RMSEs for multiple targets are depicted in Fig. 7.
From the results in Fig. 7, we see that RMSEs for multiple targets w.r.t. GA and CSM are nearly the same, which is consistent with the results in Fig. 6. The RMSE of target 1 is barely improved after the CTO process, while the RMSEs of targets 2 and 3 are significantly improved. This is due to the unoptimized linear trajectories moving toward target 1, which has an increasing SNR. After the CTO, UAVs 1 and 2 move toward target 1, and UAVs 3 and 4 gradually approach targets 2 and 3 (see Fig. 8). Therefore, the proposed CSM achieves a higher overall MTT accuracy than the unoptimized method by improving the tracking accuracy of targets 2 and 3 while maintaining the tracking accuracy of target 1.

C. Effects of Threat Avoidance on MTT Performance
The optimized trajectories of UAVs w.r.t. the CSM are shown in Fig. 8. We can see that multi-UAVs move toward multiple targets and the inter-UAV distances are increasing as time elapses. In this case, the SNRs of multiple targets and the angular separation of multi-UAV are continuously improved. This is consistent with the conclusions drawn from Fig. 6.
During the CTO process, obstacle avoidance constraints are activated at some frames, and the corresponding distances   Table III depict that the CSM is capable to avoid targets/threats when the distance between the UAV and the target/threat may be less than the clearance distance.

D. Effects of Maneuver on MTT Performance
The maneuver capability of multi-UAV is affected by the maximum speed v n max and the maximum turning angle α n max . To explore the effects of v n max and α n max on MTT performance. We consider six combinations of maximum speed and maximum turning angle, namely, (v n max , α n max ). The root objective function values corresponding to these cases are compared in Fig. 9.
In Fig. 9, cases 1-4 w.r.t. the CSM are used to verify the influence of the turning angle constraint on MTT performance, and cases 4-6 are used to verify the effect of speed constraint on MTT performance. In the first 20 frames, the tracking accuracy of case 1 is very close to that of the unoptimized method. This is because the initial speed (29.15 m/s) of the target is close to the maximum speed, and it is difficult to obtain an appropriate angular separation by a small turning angle (the CTO problem is similar to the steering control problem in this case). Comparing case 1 with case 2, we can observe the MTT performance can be further improved by increasing the maximum turning angle, but if the optimal turning angle corresponding to the maximum speed is less than the maximum turning angle, the MTT accuracy cannot be improved by increasing the maximum turning angle (see cases 2 to 4). Cases 4-6 show that the effect of maximum speed  constraint on MTT performance is relatively obvious. This is because the UAV is far from the target at frame k = 1, and the SNR is relatively low. In these cases, multiple UAVs should fly toward multitarget to improve the SNR, thus improving the tracking accuracy. In addition, when the UAV approaches the target, the performance improvement may become smaller since the UAV needs to keep a safe distance from the target (see cases 5 and 6 after frames 55).

E. Effects of Collision Avoidance on MTT Performance
In Fig. 5, the distance between every two adjacent UAVs is 500 m at frame k = 1. In this scenario, the collision avoidance is inactive during the CTO process, since multiple UAVs pull away from each other to obtain a good angular separation. In this section, we consider another scenario in Fig. 10, where the initial distance between adjacent UAVs is set to be smaller. The initial positions of UAVs 1 to 4 are set to be [4, 0, 1.7] T km, In scenario 2, the collision avoidance constraints may be activated in the first few frames. The optimized trajectories of multi-UAV are shown in Fig. 11. We can observe that UAVs 1 and 2 gradually approach target 1 while maintaining a safe distance (300 m) from obstacles during the CTO process. The obstacle avoidance constraints are activated at frames  k = 47 ∼ 50, during which a distance of at least 300 m was maintained between UAVs 1 and 2 and the obstacle points. Note that these obstacles also act as occlusions, that is, a payload sensor may not receive AOA measurements from a target when it is occluded. The occlusion leads to performance degradation of the estimator for the CM, due to the reduction in the number of measurements received. Fortunately, the obstacles did not obscure the UAV's light of sight in Fig. 11. UAVs 3 and 4 move toward targets 2 and 3 at the first few frames, then they chase these targets after the UAV and the target gets closer.
The inter-UAV distances between two UAVs that may collide and distances between UAVs and targets that may trigger threat avoidance constraints are shown in Fig. 12. It can be seen that any two adjacent UAVs are able to keep a safe distance, and UAVs 2 and 3 maintain a safe distance from targets 1 and 3, respectively. At frames k = 20, 36, the distances between UAV 3 and target 3 are slightly less than 500 m, due to the error between the predicted state provided by the KF and the true state of target 3.
The comparison of the objective functions between the proposed CSM, GA, and the unoptimized algorithm in scenario 2 is shown in Fig. 13. We can observe that the proposed CSM still achieves considerable MTT performance improvement over the unoptimized method and provides nearly the same performance as the GA. It is consistent with the conclusions drawn from Fig. 6. The performance improvement of the proposed CSM in scenario 2 is smaller than that in scenario 1 since the SNRs of multiple targets are higher in scenario 2 than that in scenario 1.

VI. CONCLUSION
A 3-D CTO framework is proposed in this article, which collects measurements from the AOA sensors mounted on multi-UAV, constructs the tracks of multiple targets, and computes the control commands for the UAVs. A two-step tracking method is applied in MTT to provide the prior information for CTO, and the PC-CRLB is adopted as the tracking performance measure. We represent the height, collision, and obstacle/target/threat avoidance as time-varying security constraints, and then formulate the CTO as a nonconvex problem subjected to the security and dynamic constraints. The CSM is presented to solve the resulting CTO problem. If all security constraints are inactive, the CTO can be simplified as a nonconvex problem with convex dynamic constraints, which can be solved by the NSPG method. Conversely, we present an ADPM to solve the CTO problem with some positive security constraints. The ADPM introduces auxiliary vectors to decouple the complicated constraints and separates the CTO into several subproblems and tackles them alternately, while locally adjusting the penalty factor at each iteration. Numerical examples demonstrate that the CSM provides a significant improvement in passive MTT performance in comparison with the unoptimized method. Besides, the CSM achieves near-optimal performance with much lower complexity compared with the GA. The CSM is myopic since it only considers the MTT performance in the next time step. A more effective way would be the multistep CTO since each optimization will affect the possible trajectories over several future fusion intervals. The multistep CTO problem will be considered as our further work. He is currently a Professor and the Director with the National Laboratory of Radar Signal Processing, Xidian University. His research interests include radar signal processing, radar automatic target recognition, adaptive signal processing, and cognitive radar.