Game theoretic distributed waveform design for multistatic radar networks

We examine the interaction of multiple-input multiple-output based clusters of radars within a game theoretic framework, using potential games. The objective is to maximise the signal-to-disturbance ratio of the clusters of radars by selecting the most appropriate waveforms. We prove that the proposed game theoretic algorithm converges to a unique Nash equilibrium using discrete concavity and the larger midpoint property. As a result, each cluster can determine the best waveform for illumination (equilibrium) by strategising the actions of the other clusters.


I. INTRODUCTION
Game theory is a branch of mathematics that models and analyses the interaction of decision makers, called players, under the assumptions of the rationality of players and strategic interdependence.Each player aims to maximise his/her gain (utility) as a best response to the actions of the other players [1].In a radar network, game theoretic methods can improve the performance of the radars by modelling their interaction as a game and finding the state, called equilibrium, where the performance is maximised for all radars, simultaneously.
Various game theoretic approaches have been proposed recently for modelling and allocating resources and shown to enhance the performance of the underlying system.For example, interaction of a radar and a missile has been examined through differential games [2].The radar aims to minimise the uncertainty of the missile's position by changing the filter gain, while the missile tries to maximise this uncertainty.The interaction between a multiple-input multiple-output (MIMO) radar [3,4] and an opponent is modelled as a zero-sum game in [5].The radar uses different signal polarisations in order to detect the target, while the opponent uses different types of aerial vehicles to avoid being detected.In [6] a radar with constant false alarm rate processing aims to detect a target equipped with a jammer as a self-defence mechanism.The authors examined the scenarios of surface surveillance and target detection, and used a zero-sum game to model the interaction between the radar and the jammer.In [7], the radar and jamming interaction was formed as a zero-sum game using mutual information as a criterion for optimisation.In a radar network, the problem of allocating power that will result in target detection while maintaining the interference at low levels was modelled as a generalized Nash game [8].In [9,10], this work was extended to a MIMO radar network.In [11], the authors proposed a waveform allocation scheme for three different types of receiver filters, based on potential games, that improves the signal-to-disturbance (SDR) ratio.It has been shown that through an iterative process of sequential waveform adaptations the radars can reach the Nash equilibrium and maximise their performance.
The main contributions of this work are the generalisation of the monostatic game theoretic model of [11] to a MIMO radar network and, most importantly, a detailed game theoretic analysis for the proof of the existence and uniqueness of the Nash equilibrium using the larger midpoint property (LMP) [12].Moreover, as the potential function in [11] also possesses the LMP, which is instrumental for the proof of uniqueness, our analyses also complete the proof required for uniqueness of the work in [11].
In terms of the motivation of the work, it should be highlighted that the work considered here is not a game between adversaries.Instead, we consider a situation that a group of friendly radars aim to optimise and select the best waveforms for illumination distributively without requiring explicit communications among themselves.The noncooperative game theory fits well for this purpose, as also prescribed in [11].In particular, the uniqueness of the equilibrium as proved in this paper allows clusters of radars to interact strategically and reproduce the actions of all the other clusters without a need for exchanging any information, hence to choose most appropriate waveforms without a need for explicit communication among themselves.Part of the results presented in this work also appears in [13].
A brief introduction to game theory with an emphasis on potential games is given in Section II.A detailed description of the proposed model together with the game theoretic analysis are presented in Section III.Section IV presents the simulation results and related comments, followed by conclusions in Section V.
Following the customary notation, all vectors appear in bold, while the superscript H denotes the conjugate transpose of a complex vector.The inner product between two vectors x and y is denoted by x, y , and x 1 denotes the 1 norm of a vector x.

II. GAME THEORY AND POTENTIAL GAMES
The characteristics of the interaction of the players in a game in strategic form are summarised in the triple = N , {S i } i∈N , {u i } i∈N , which is called a game in strategic form.The finite set of players is denoted by N , and has size N ∈ N. The set S i consists of the actions of player i, while the utility of this player is given by the function u i , ∀i ∈ N .Let S = S 1 × . . .× S N .Then any element of S is called an action profile.The solution concept that we use in this work is the Nash equilibrium and is defined as the action profile (s * 1 , . . ., s * where the subscript -i denotes all players apart from player i.In other words, no player can increase his/her utility by deviating from the equilibrium, provided that the other players follow the Nash equilibrium profile [1].
An exact potential game [14], or in short potential game, is a type of game with the property that there exists a global function that reflects the change in utility of a player as a result of a change in this player's action.This function is called the potential of the game.In mathematical terms, a game = N , {S i } i∈N , {u i } i∈N is a potential game if there exists a function P : S → R such that ∀i ∈ N and s Using the definition of the Nash equilibrium in combination with the property of potential games, we can see that any point s ∈ S that maximises P is a Nash equilibrium [14].Furthermore, notice that in exact potential games, the set of points that maximise the utility function coincide with the set of points that maximise the potential.This property introduces a new class of potential games, called best-response potential games as defined in [15].By definition, the potential function assigns a real number to every possible action profile.Because it is not associated with any player, it can be thought of as a function that describes the behaviour of the set of players as a whole.As a consequence, potential games provide the opportunity to the players to maximise their utility and at the same time improve the social welfare of the group without the need for cooperation.The MIMO radar game under this potential game setup is described in the next section.

A. A MIMO Radar Network
We consider a network of radars that are partitioned into clusters to form a MIMO radar configuration.Particularly, the network consists of C = {C 1 , . . ., C K } clusters with each cluster containing M radars, i.e., C k = {R k1 , . . ., R kM } ∀k = 1, . . ., K. The objective of each cluster is to achieve a good detection performance aimed at a common target, which is measured in terms of the SDR.We assume that radars within the same cluster are able to communicate, while information sharing among clusters is not feasible.As a result, intercluster interference is unavoidable.However, clusters do not compete with each other; hence, the interfering signals are unintentional.The goal of each cluster is to maximize its detection performance, while maintaining low interference levels.
In order to utilise the capabilities of the MIMO architecture, we assume that the signals transmitted from each radar within the same cluster are orthogonal to each other.Transmitted signals from different clusters might be correlated due to various reasons, including the absence of cluster transmission synchronisation.Due to the feasibility of communication and synchronisation of radars within each cluster, the return signal at each radar, which is formed of N pulses, is matched filtered with the transmitted waveforms of all radars within the same cluster.The return signal x kn at the nth radar in the kth cluster can be written as where s kr denotes the signal vector transmitted by the rth radar in cluster k, i.e., R kr .The parameter α krn relates to the target cross section.The vector s t represents the interference signal from radar t in the th cluster to the radar R kn .The penultimate term in (2) denotes the clutter coming from all radars r = 1, . . ., M in cluster k to radar R kn , and γ krn,m is the coefficient describing the radar cross section (RCS) of the clutter.The thermal noise vector introduced at radar R kn is denoted by n kn , and J m is a shift matrix whose entries are Combining the above, the SDR for the radar R kn is given by where G tkn denotes the antenna gain for radar R kn in the direction of the radar R t , for k, ∈ {1, . . ., K} and n, t ∈ {1, . . ., M}.The noise power is set to σ 2 n = 1.The numerator of the SDR describes the return signal echoed by a target, while the denominator consists of the power of clutter echoes, interference, and noise.It should be noted that this game can be extended to multiple targets in various range Doppler bins, but the utility of each player in this case needs to be modified, e.g., as the SDR of the worst case target.However, this is beyond the focus of the work presented here.

B. Game Theoretic Formulation
Motivated by the work in [11], we model the interaction of the clusters in the network as a potential game.The players of the game are the K clusters of the network, and for the remainder of the paper, the words clusters and players will be used interchangeably.The action set of each cluster is formed of a predefined set of M number of N-tuple mutually orthogonal vectors that are publicly known.Specifically, let W ⊂ C N be the predefined set of available waveforms.Then, ∀k = 1, . . ., K Note, that the set W is known; hence, the action sets are identical for all players.We also assume that the antenna gains G tkn , for , k = 1, . . ., K, t, n = 1, . . ., M, are publicly known.Because the signals have unity norm, (4) suggests that the SDR is maximized when the denominator is minimised.Extending the utility function in [11] to our MIMO model, the utility function for player k is written as follows: Having defined the utility function of the player, the game now can be described by the triple = C, [11], we define the function P : A → R to be the sum of the denominators of the SDR of all radars in every cluster.In other words, By isolating the terms corresponding to the kth player in the summation in (5), we can prove that P satisfies (1), and thus it is a potential function of the game .As mentioned before, the set of the available waveforms W and antenna gains are known to all players.The radars need to know the RCS of clutters |γ ktn,m | 2 .When played distributively, it is difficult to obtain the instantaneous values of clutter RCS; however, this can be replaced by its statistical average, i.e., E[|γ ktn,m | 2 ], based on the knowledge of the environment obtained, e.g., with the aid of digital terrain maps [16].Hence, the instantaneous RCS can be substituted by its expected value, and as shown later in the simulation, the advantage of the proposed distributed optimization still prevails strongly even in the absence of the knowledge of the instantaneous RCS.At the beginning of the game, the players choose randomly an M-tuple of orthogonal waveforms from their action set A k , for k = 1, . . ., K.Then, in a sequential manner, they update their waveforms according to the potential function.In particular, at round t, the player whose turn it is to play updates the waveform (action) by choosing the new waveform s t k as This iterative process generates the following sequence of waveforms (action profiles) This sequence is called an improvement path because at each step, we have that u k (s t ) > u k (s t-1 ) for the player k who updates his/her action profile at time t.Notice, that the players maximise the same objective function P with respect to a particular dimension; hence, the order with which they choose to act does not affect the final result of the optimisation.As previously mentioned, the action set A is finite, which implies that the improvement path must be finite.When none of the players can improve further their utility, the improvement path is maximal and terminates at equilibrium (s * 1 , . . ., s * K ), where s * k = arg max As mentioned before, the waveform library W and the antenna gains are known to all players.Hence, there is no requirement for player k to know s t−1 −k because he can independently recreate the waveform sequence (6) by using the game theoretic algorithm as if he was one of the other players.This process is performed by all players independently by assuming that the other players are also strategising.Once the clusters have reached the equilibrium, each one has determined the best waveform for illumination.Additionally, as proved in the next section, the equilibrium is unique; thus, it is unnecessary for the players to agree on a starting point because they will reach the same equilibrium independently of the initial vector of the waveforms.However, the optimum waveforms will need to be updated continuously because the location of the target varies.This is because the clutter statistics and the antenna gains may differ depending on the range bin within which the target falls at any specific time.

C. Existence and Uniqueness of Equilibrium
Due to the finite nature of the game, in terms of both the number of players and their action sets, the existence of an equilibrium is guaranteed according to the Corollary 2.2 in [14], which states that every finite potential game attains an equilibrium.In [17], it is shown that if the action sets of all players of the potential game are convex and compact, and additionally, the potential function is continuously differentiable and strictly concave, then the game has a unique Nash equilibrium.However, this result cannot be applied to our case because the potential function is defined over a discrete set.Therefore, we use the results in [12] on discrete concavity for potential games to prove the uniqueness of the equilibrium.We first show that the potential function has the LMP, and then we prove that the set of maximisers of the potential is a singleton.To show that the equilibrium of the game is unique, it suffices to prove that the set of maximisers of the potential function contains only one element.Following Proposition 1 in [12], the LMP implies that a local optimum of the potential is global.Thus, the points of the discrete set P(w) behave as the points of the image of a concave function.The situation where the set of maximisers {w * ∈ A|w * = arg max w∈A P (w)} is not a singleton occurs when there exists distinct points, w * 1 , . . ., w * L in the domain of P, such that P (w * 1 ) = . . .= P (w * L ).Because the domain of the potential function depends entirely on the application, the existence of a unique maximum element depends on the particular domain of P.
Observing the potential function ( 5), we notice that it is zero at the point 0, i.e., P(0) = 0, and negative on every other w in its domain A because the gains are positive numbers.Furthermore, P(-w) = P(w), which means that the potential is an even function.Thus, if (w * 1,11 , . . ., w * N,K2M ) is an equilibrium point, then all the other maximisers belong to the set To determine if the game has multiple equilibria, we simply run the game theoretic algorithm with different initial waveforms and check if the resulting equilibria are different.We can ensure that the game has unique equilibrium by removing all but one element that belongs to A ∩ W * .
We conclude this section by showing that the potential function in [11] also satisfies the LMP.The model in [11] can be thought of as a special case of the model described in Section III, where the clusters in the network consist of only one radar.Thus, we can prove that the potential function in [11], i.e., [11, eq. 6], satisfies the LMP.
COROLLARY 1.2 The potential function defined in [11], i.e., [11, eq. 6], satisfies the LMP.PROOF The proof follows the same steps as the proof of Proposition 1 with M = 1.

IV SIMULATION RESULTS: DISCUSSION
To support the theoretical model, we simulate the interaction of clusters for a network that is formed of three clusters, i.e., C 1 , C 2 , C 3 , and each cluster consists of two radars, i.e., C k = {R k1 , R k2 }, for k = 1, 2, 3.The action set, which by the definition of our game is the same for all players, consists of pairs of orthogonal waveforms that are taken from the waveform library described in [11].We assume that two waveforms w, w are orthogonal if w H w ≤ 10 -6 .The waveform library that we formed contains 121 pairs of orthogonal waveforms, (w k1 , w k2 ) with k = 1, 2, 3, whose elements correspond to the two radars  in the cluster.For the initialisation of the game, the players choose pairs of orthogonal waveforms from their action set randomly.The antenna gains for each radar, which are publicly known to all players, are presented in Table I.
Recall that the notation G tkn denotes the antenna gain for the radar R kn in the direction of radar R t .The values in Fig. 1 presents the SDR achieved at each radar receiver for each player, using the proposed game theoretic algorithm for the considered network topology.The performance is compared to the SDR obtained when the players choose the waveforms randomly from the set of waveforms (random choice model).In order to investigate the convergence of the game, in the first simulation, we have fixed the instantaneous RCS of the clutter to its expected value of 1.1170.The convergence to the equilibrium of the game theoretic algorithm is visible in the subfigures, and the equilibrium is reached within a few iterations.Because the equilibrium of the proposed game is unique, the algorithm will converge to it independently of the initial set of waveforms that were used.Hence, the players can reach to the same Nash equilibrium regardless of the choice of the initial waveforms and using only the information that is publicly known.In all subfigures of Fig. 1 we can also see the sequential update of the waveforms: when the first player updates the waveform that results in a better SDR, the other players continue using the initial waveforms until it is their turn to play and improve their SDRs.The same figures include the SDR values when the clusters choose their waveforms randomly.The results of the random choice model were obtained by taking the average SDRs over 100 realisations.A comparison of the two models shows that the performance of the clusters is greatly improved when they follow the game theoretic scheme.
We obtained SDR values at equilibrium for the first cluster averaged over all radars in the cluster, for both the proposed game theoretic method and the random choice model and for varying number of clusters in the network, as depicted in Fig. 2. The number of radars in each cluster is two.As the number of clusters increases, the interference induced in the network increases as well, which in turn leads to reducing the SDR values for all clusters.However, when the clusters follow the game theoretic algorithm, their SDR is still better than the SDR obtained using the random waveform selection.For the final simulation, we allowed the RCS of the target and the clutter to take random values as prescribed by the Rayleigh and Weibull distributions, respectively, with the parameters mentioned earlier.The game theoretic algorithm however assumed the expected value of the RCS.Table II depicts the average SDR obtained using 100 random realisations of RCS, where the players use the waveforms resulting from the game theoretic algorithm (waveforms at equilibrium), and waveforms that have been chosen randomly.As seen in Table II, the performance of the radars is still much better when the players follow the potential game as compared to the random selection of waveforms, confirming the benefit of the proposed approach even in the absence of exact knowledge of RCS.

V. CONCLUSION
We have proposed a game theoretic waveform allocation method for a network of multistatic radars.Using results on discrete concavity, we have proven that the Nash equilibrium of the game is unique for specific set of waveforms that can be carefully chosen by the radar network.The uniqueness of the Nash equilibrium allows each cluster of radars to determine the best set of waveforms by strategizing the actions of other clusters; hence, distributed optimisation without a need for intercluster communication is possible.The results show that the proposed game theoretic method provides better performance as compared to random allocation of waveforms even in the absence of exact knowledge of the instantaneous RCS of clusters.

PROPOSITION 1
The potential function P : A → R given by (5) satisfies the LMP.PROOF See the Appendix.COROLLARY 1.1 The set of Nash equilibria of the game = C, {A k } k∈{1,...,K} , {u k } k∈{1,...,K} and the set {w * ∈ A|w * = arg max w∈A P (w)} are the same.

Fig. 1 .
Fig. 1.Values of SDR throughout potential game and comparison with SDR evaluated using random choice of waveforms.Network consists of three clusters each with two radars.(a) Player 1; (b) player 2; (c) player 3.

Fig. 2 .
Fig. 2. Change in SDR values for cluster C 1 as size of network increases.

TABLE I Antenna
Gains in Decibels; the Notation G tkn Denotes the Antenna Gain of the nth Radar in Cluster k in the Direction of the Radar t in the th Cluster.

TABLE II Average
SDR Evaluated at Waveforms Obtained Using the Game Theoretic Algorithm and Waveforms Chosen at Random; the SDR Has Been Calculated Taking into Account the Instantaneous Radar Cross Section Coefficients.