Quantum Mean-Field Games with the Observations of Counting Type

: Quantum games and mean-ﬁeld games (MFG) represent two important new branches of game theory. In a recent paper the author developed quantum MFGs merging these two branches. These quantum MFGs were based on the theory of continuous quantum observations and ﬁltering of diffusive type. In the present paper we develop the analogous quantum MFG theory based on continuous quantum observations and ﬁltering of counting type. However, proving existence and uniqueness of the solutions for resulting limiting forward-backward system based on jump-type processes on manifolds seems to be more complicated than for diffusions. In this paper we only prove that if a solution exists, then it gives an (cid:101) -Nash equilibrium for the corresponding N -player quantum game. The existence of solutions is suggested as an interesting open problem.

Using approaches from [9,10], one can transform any game to a new quantum version. This transformation modifies in a systematic way all properties of the games: equilibria, their stability, etc. For instance, stability of the equilibria of the transformed Replicator Dynamics for two-player two-action games was analyzed in [14]. ESS (evolutionary stable strategies) for the transformed Rock-Paper-Scissors game was analyzed in [15], and for 3 player games in [16]. The transformations of the simplest cooperative games were analyzed in [17]. In [18] the EWL (Eisert, Wilkens and Lewenstein) protocol was applied to the Battle of Sexes, in [19] to the general prisoner's dilemma and in [20] to the three player quantum Prisoner's dilemma. Peculiar behavior and remarkable phase transitions were found. The extension of EWL protocol for games with continuous strategy space was suggested in [21].
For application of related quantum concepts (including quantum probability) to cognitive sciences we refer to [22,23] and references therein.
The main accent in all these developments was made on stationary or repeated games, see, e.g., [24,25] for the latter, and [26] for their interpretation in economics. Not only for games, but generally for quantum control the main stream of quantum control research is based on open loop controls, with a rare appearance of a feedback control, see, e.g., [27] and [28].
The present paper initiates the study of the truly dynamic theory with observations of counting type and with the strategies chosen by players in real time. Since direct continuous observations are known to destroy quantum evolutions (so-called quantum Zeno paradox) the necessary new ingredient for quantum dynamic games must be the theory of non-direct observations and the corresponding quantum filtering. This theory is usually performed in two forms: diffusive (or homodyne) type and counting type. In paper [1] the author developed quantum MFGs based on diffusive type filtering. In the present paper quantum MFGs are built for counting type quantum observations and filtering.
As a part of the construction we show that the limiting behavior of mean field interacting controlled quantum particles (or N-player quantum game) can be described by certain classical MFG forward-backward system of jump-type equations on manifolds, the forward part being given by a new kind of nonlinear jump-type stochastic Schrödinger equations. One of the objectives of the paper is to draw the attention of game theorists to this type of games and this type of forward-backward systems, which were not studied before, and no results even on the existence of solutions are available. These objects are fully classical, but represent the limit of quantum games.
The main result states that any solution to this forward-backward system represents an approximate N −1/4 -Nash equilibrium for the initial N-player dynamic quantum game.
The content of the paper is as follows. In the next section we recall the basic theory of quantum continuous measurement and filtering. In Section 3, as a warm-up, we discuss briefly an example of a two-player quantum dynamic game on a qubit with observation and feedback control of counting type. In Section 4 the new nonlinear equations are introduced for the case of controlled counting detection and the convergence of N-particle observed quantum evolutions to the decoupled system of these equations is obtained, together with explicit rates of convergence. In Section 5 the MFG limits for quantum Nplayer games are introduced and it is proven that solutions for the limiting MFG equations specify -Nash equilibria for N-player quantum game, with of order N −1/4 . The limiting MFGs can be also looked at as classical MFGs, though complex-valued and evolving in infinite-dimensional manifolds. In the final section we state the problem of existence of the solutions, even in the simplest case of the control problem on a qubit.

Quantum Filtering of Counting Type
The general theory of quantum non-demolition observation, filtering and resulting feedback control was built essentially in papers [29][30][31]. For alternative simplified derivations of the main filtering equations given below (by-passing the heavy theory of quantum filtering) we refer to [32][33][34][35][36] and references therein. For the technical side of organising feedback quantum control in real time, see, e.g., [37][38][39].
We shall describe briefly the main result of this theory. The non-demolition measurement of quantum systems can be organised in two versions: photon counting and homodyne detection. As was stressed above, here we shall deal only with counting measurements. In this case the main equation of quantum filtering takes the form in terms of the density matrices γ t , where H is the Hamiltonian of the free (not observed) motion of a quantum system, the operators {L j } define the coupling of the system with the measurement devices, and the (counting) observed Poisson processes N j t are independent and have the position dependent intensities tr(L * j L j γ t ), so that the compensated processes M In this paper we shall deal only with the simplest case when the operators L are unitary. In this case dM j t = dN j t − dt and Equations (1) and (2) become linear and take the form This dynamics preserves the set of pure states. Namely, if φ satisfies the equation The theory of quantum filtering reduces the analysis of quantum dynamic control and games to the controlled version of evolutions (1). Two types of control can be naturally considered (see [40]). The players can control the Hamiltonian H, say, by applying appropriate electric or magnetic fields to the atom, or the coupling operators L j . Thus (3) extends to the equation with some self-adjointĤ, control u and a family of unitary operators L(v) depending on a control parameter v. It is seen from Equation (5) that its evolution preserves traces of matrices. One can also show that these evolutions preserve positivity of matrices γ (see, e.g., [36]).

Example of a Quantum Dynamic Two-Player Game
Let us stress again that the whole physics of quantum dynamic games with a feedback control of a finite number of players is incorporated into the stochastic filtering Equation (1), so that the quantum dynamic games are reduced to the stochastic games with jumps governed by this equation with operators H and L that may depend on control. As a warm-up before the mean-field setting let us consider the simple example of a zero-sum quantum dynamic two-player game on a qubit, where a complete analytic solution can be found.
Working with a qubit means that the Hilbert space of the quantum system is two dimensional. Let L be fixed and the Hamiltonian be the sum of two parts, controlled by the first and the second player respectively. Stochastic filtering Equation (1) simplifies to the equation (omitting index t) u, v being control parameters of players I and II. Assume Moreover, ψ has only two coordinates: ψ = (ψ 0 , ψ 1 ). Using Ito's rule dN t dN t = dN t we find the equation for ψ −1 0 : Consequently, again by Ito's rule, we find the equation for w = ψ 1 /ψ 0 : where W = (w 0 , w 1 ) = (1, w). Let us choose the simplest possible L: L = σ 3 -the third Pauli matrix (diagonal with diagonal elements 1 and −1). Then Equation (7) simplifies to The payoffs in quantum setting are given by certain operators, that is, they have the form where J and F are some self-adjoint operators. They may depend on the control parameters, but we shall look for the case when they do not. In terms of w this payoff rewrites as Thus the zero-sum quantum dynamic two-player game (with a feedback control) with a fixed horizon T in this setting is the stochastic dynamic game with the state space C, with the evolution described by the jump-type stochastic Equation (8) and payoff (10). The aim of the first player is to maximise the expectation of (10) using an appropriate feedback strategies u(.) = u(t, W t ). The second player tries to minimise it using an appropriate feedback strategies v(.) = v(t, W t ).
The remarkable feature of this game is that the possible jumps are only of type w → −w. Consequently, in the coordinates r = x 2 + y 2 and ξ = y/x (where w = x + iy), the dynamics is deterministic. Therefore, if the operators J and F of current and terminal payoffs are invariant under the transformation w → −w, the game can be reduced to a deterministic differential game. This game is still very complicated.
Let us consider now the most trivial example of commuting operators H 1 and H 2 controlled by two players. To be concrete, let us chose H 1 diagonal with diagonal elements 1 and 0, and H 2 diagonal with elements 0 and 1. Then Equation (8) becomes linear in w: and then the modulus ρ = |w| 2 becomes the integral of motion: d(|w| 2 ) = 0. Choosing ρ = 1 for definiteness we get the equation for the angle φ on the circle ρ = 1: If J and F are invariant under the transformation w → −w, we can identify points when cos φ differ only by a sign (so that possible jumps cos φ → − cos φ become irrelevant), and the evolution on a circle, given by the set φ ∈ [−π/2, π/2] with identified endpoints, becomes deterministic: that is a simple rotation. Choosing F = 0 and the simplest nontrivial J with zero diagonal elements and real numbers j as non-diagonal terms. The payoff (10) for ρ = |w| = 1 simplifies to The HJB-Isaacs equation takes the form Assuming for definiteness that U > V, so that the first player has an edge in this game, the equation rewrites as This is HJB of a pure maximisation problem. It can be solved via the method of viscosity solutions. For instance, let us find a stationary solution describing the average winning of the first player per unit of time in a long lasting game. For this one searches for a solution to (15) in the form S = λ(T − t) + S 0 (φ) with a constant λ. Then S 0 (φ) (obviously defined up to a constant multiplier, so that we can set S 0 (0) = 0) satisfies the equation To guess the right solution one can derive from the meaning of this equation that S 0 must be an even function of φ with maximum at φ = 0, decreasing on [0, π/2]. Hence (∂S 0 /∂φ)(0) = 0 and thus λ = j and Equation (16) This function (considered as periodically continued with period π to the whole line) is smooth outside points (2k + 1)π/2, where it has convex kinks. Hence this is really the viscosity solution to (16) confirming that our educated guess above was correct and that λ = j is the income per unit of time to the first player for a long lasting game.
Another example for the case of quantum control (without games) was given in [28].

Controlled Limiting Stochastic Equation
Let X be a Borel space with a fixed Borel measure that we denote dx. For a linear operator O in L 2 (X) we shall denote by O j the operator in L 2 (X N ) that acts on functions f (x 1 , · · · , x N ) as O acting on the variable x j . For a linear operator A in L 2 (X 2 ) we shall denote by A ij the operator in L 2 (X N ) that acts as A on the variables x i , x j . Let H andĤ be two self-adjoint operators in L 2 (X) and A a self-adjoint integral operator in L 2 (X 2 ) with the kernel A(x, y; x , y ) that acts on the functions of two variables as Aψ(x, y) = It is assumed that A is symmetric in the sense that it takes symmetric functions ψ(x, y) (symmetric with respect to permutation of x and y) to symmetric functions.
Let us consider the quantum evolution of N particles driven by the interaction Hamiltonian Here continuous functions u j (t, γ) describe the controls of jth agent, who is supposed to have access to the jth subsystem, namely to the partial trace Γ In order to be able to carry out a feedback control we assume further that this quantum system is observed via coupling with the collection of (possibly controlled) identical oneparticle unitary families L(v). That is, we consider the filtering Equation (3) of the type The corresponding density matrix Γ N,t = Ψ N,t ⊗ Ψ N,t satisfies the equation of type (5): The main ingredient in the construction of quantum MFG theory is the quantum law of large numbers that states that as N → ∞, the limiting evolution of each particle (precise conditions are given in the theorem below) is described by the nonlinear stochastic equation where Aη t is the integral operator in L 2 (X) with the integral kernel A(x, y; x , y )η t (y, y ) dydy and η t (y, z) = E(ψ j,t (y)ψ j,t (z)).
The equation for the corresponding density matrix γ j,t = ψ j,t ⊗ψ j,t writes down as For the analysis of the limiting behavior we use an approach from [41,42], where the main measures of the deviation of the solutions Ψ N,t to N-particle systems from the product of the solutions ψ t to the Hartree equations are the following positive numbers from the interval [0, 1]: In the present stochastic case, these quantities depend not just on the number of particles in the product, but on the concrete choice of these particles. The proper stochastic analog of the quantity α N (t) is the collection of random variables where the latter equation holds by the definition of the partial trace. Here γ j,t is identified with the operator in L 2 (X N ) acting on the jth variable and Γ Since evolutions (20) preserve the set of operators with the unit trace, (23) rewrites as Assuming that all controls u j and v j are given by identical feedback functions u(t, γ), v(t, γ) and that the initial conditions for Equation (19) is the tensor product of i.i.d. random vectors, the expectations Eα N (t) = Eα N,j (t) are well defined (they do not depend on a particular choice of particles).
Expressions α N,j can be linked with the traces by the following inequalities, due to Knowles and Pickl: α N,j (t) ≤ tr|Γ (j) Let A be a symmetric self-adjoint integral operator A in L 2 (X 2 ) with a Hilbert-Schmidt kernel, that is a kernel A(x, y; x , y ) such that A(x, y; x y ) = A(y, x; y , x ), A(x, y; x , y ) = A(x , y ; x, y).
Let ψ j,t be solutions to Equation (21) with i.i.d. initial conditions ψ j,0 , ψ j,0 = 1. Let Ψ N,t be the solution to the N-particle Equation (19) with H u (N) given by (18) and with the initial condition Proof. By Ito's product rule for counting processes, with the Ito product rule being dN j t dN i t = δ j i dN j t . Let us denote by I and II the parts of the differential dα N,j (t) that contain L j and, respectively, not.
Starting with II we obtain, denoting Aη t j the operator Aη t acting on the jth variable, that and where for the last inequality we used (25). The term I I 1 was dealt with in [1] (proof of Theorems 3.1) yielding the estimate Let us turn to I. We have Since γ j,t and L k with k = j commute, it follows that all terms with k = j cancel. Taking into account other cancelation (arising from the unitarity of L j ) we obtain If L j would be constant, this expression would vanish. In the present controlled version, some work is required. First of all, writing γ j,t = 1 − q j,t we obtain To make the calculations more transparent, let us omit indices at v, γ, q, Γ. Thus where We can now estimate C 1 j,t as I I 2 above yielding With C 2 j,t yet another add-and-subtract manipulation is required. Namely, The first term is estimated as above yielding And the second one is estimated as Therefore, since M j t is a martingale and its differential does not contribute to the expectation, it follows that Applying Gronwall's inequality yields (30).

Quantum MFG
Let us consider the quantum dynamic game of N players, where the dynamics of the density matrix Γ N,t is given by the controlled dynamics of type (20): Assume as above that controls u j and v j of each jth player can be chosen from some bounded closed intervals [−U, U] and [−V, V] respectively, that the initial matrix is the product of iid states, and that the payoff of each player on the interval [t, T] is given by the expression where J and F are some operators in L 2 (X) expressing the current and the terminal costs of the agent, J j and F j denote their actions on the jth variable, constants c ≥ 0 measure the cost of applying control u.

Remark 1. (i)
We choose the simplest payoff function. Of course more general dependence on u, v is possible. As long as payoff is convex in u and v the results below are still valid. (ii) Also everything remains in force if only H or only L is controlled, that is either u or v is not present in all formulas.
Notice that by the property of the partial trace, the payoff (34) rewrites as so that it really depends explicitly only on the individual partial traces Γ (j) N,t , which can be considered as quantum analogs of the positions of classical particles.
Let us stress again that, after all equations arising from physics are written, our quantum dynamic N-player game can be formulated in fully classical terms. Namely, the goal of each jth player is to maximise the expectation of payoff (35) under the evolution (33) depending on all controls u = (u j ). The information available to the jth player is the 'position' of jth player, which is the partial trace Γ N,t ). An additional technical assumption that we are using in the analysis below is that the class of feedback strategies is reduced to Lipschitz continuous functions of partial traces. Therefore both the information setting and technical assumptions are slightly different from the simpler setting of two-player game of Section 3, where players were assumed to define their strategies on the basis of the whole state (not a partial trace). The restriction to partial traces is necessary to uncouple the dynamics in the limit of N → ∞.
The limiting evolution of each player can be expected to be described by the equations with η t (x, y) = lim For pure states γ j,t = ψ j,t ⊗ψ j,t this payoff turns to Let us say that the pair of functions (u, v) MFG t (γ) = (u, v) MFG (t, γ) with t ∈ [0, T] and γ from the set of density matrices in L 2 (X), and η MFG t (x, y) with x, y ∈ X, t ∈ [0, T], solve the limiting MFG problem if (i) (u, v) t (γ) is an optimal feedback strategy for the stochastic control problem (36), (37)  Proof. Assume that all players, except for one of them, say the first one, are playing according to the MFG strategy (u, v) MFG (t, Γ (j) N,t ), j > 1, and the first player is following some other strategy (ũ,ṽ)(t, Γ N,t ). By the law of large numbers (which is not affected by a single deviation), all η j t are equal and are given by the formula η t = Eγ j,t for all j > 1. Moreover, Eα N,j (t) = Eα N (t) are the same for all j > 1.
Following the proof of Theorem 1 we obtaiṅ α N,j (t) = I + I I 1 + I I 2 (39) with the same I, I I 1 , I I 2 , as in the proof of Theorem 1, N,t ) for j = 1. Looking first at j > 1 we note that up to an additive correction of magnitude not exceeding 4 A HS /N expression I I 1 can be substituted by the expression which is then dealt with exactly as in the proof of Theorem 1 (with N − 1 instead of N) yielding the same estimate (30) (with a corrected multiplier) for Eα N (t) = Eα N,j (t), j > 1, that is The same estimate is obtained for Eα N,1 (t) (even without the correcting term 4 A HS ) yielding Eα N,j (t) ≤ C(T)N −1/2 for all j and a constant C(T) depending on A HS , κ, κ L , Ĥ .
We can now compare the expected payoffs (35) received by the players in the N-player quantum game with the expected payoff (37) received in the limiting game. For each jth player the difference is bounded by N,s − γ j,s |, and by (25), it follows that the expectation of the difference of the payoffs is bounded by with a constant C(T) depending on A HS , κ, κ L , Ĥ . But by the assumption of the Theorem, (u, v) MFG t is the optimal choice for the limiting optimization problem. Hence the claim of the theorem follows.

Discussion
The problem of proving existence or uniqueness for the solution of the limiting MFG on manifold seems to be nontrivial. We suggest it as an interesting open problem.
Let us give a bit more detail for the simplest case of two-dimensional Hilbert space (a qubit), as in Section 3.
When there is no control v (that is, operator L is constant) and there is no free (uncontrolled) part of the Hamiltonian, the limiting Equation (21) simplify to the equation (omitting indices j and t for simplicity) Moreover, ψ has only two coordinates: ψ = (ψ 0 , ψ 1 ). Using Ito's rule as in Section 3, we find the equation for w = ψ 1 /ψ 0 : (42) where W = (w 0 , w 1 ) = (1, w).
Already this equation on the complex plane C, describing optimal control for the individual quantum feedback control in a qubit, is quite nonstandard. And to deal with the corresponding forward-backward system one needs not only its well-posedness in a certain generalized sense, but some continuous dependence on parameters. May be some method from [43] or [28] can be used to get insight into this problem.
As a future research direction it is worth mentioning the general development of the theory of the limiting classical mean-field games, which are mean-field games on infinite dimensional curvilinear manifolds based on Markov processes with jumps, highly fascinating and nontrivial objects. Of course usual questions of classical mean-field games on the connection between stationary and time dependent solutions are fully open here, as well as the theory of the corresponding master equation. On the other hand, quantum dynamic games of finite number of players (touched upon in Section 3) lead to new nonlinear functional-differential equations on manifolds of Hamilton-Jacobi or Isaacs type, which are also worthy of proper analysis.

Funding:
The author gratefully acknowledges the funding by the Russian Academic Excellence project '5-100'.

Conflicts of Interest:
The author declares no conflict of interest.