Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games

Clempner, Julio B.; Poznyak, Alexander S.

doi:10.1007/s10957-020-01729-3

Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games

Published: 07 August 2020

Volume 186, pages 1029–1052, (2020)
Cite this article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

278 Accesses
7 Citations
Explore all metrics

Abstract

This paper suggests a procedure to construct the Pareto frontier and efficiently computes the strong Nash equilibrium for a class of time-discrete ergodic controllable Markov chain games. The procedure finds the strong Nash equilibrium, using the Newton optimization method presenting a potential advantage for ill-conditioned problems. We formulate the solution of the problem based on the Lagrange principle, adding a Tikhonov’s regularization parameter for ensuring both the strict convexity of the Pareto frontier and the existence of a unique strong Nash equilibrium. Then, any welfare optimum arises as a strong Nash equilibrium of the game. We prove the existence and characterization of the strong Nash equilibrium, which is one of the main results of this paper. The method is validated theoretically and illustrated with an application example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Proximal/Gradient Approach for Computing the Nash Equilibrium in Controllable Markov Games

Article 20 January 2021

Julio B. Clempner

Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach

Article 23 June 2016

Julio B. Clempner & Alexander S. Poznyak

On Some Approaches to Find Nash Equilibrium in Concave Games

Article 16 May 2019

A. V. Chernov

References

Clempner, J.B., Poznyak, A.S.: Convergence method, properties and computational complexity for Lyapunov games. Int. J. Appl. Math. Comput. Sci. 21(2), 349–361 (2011)
Article MathSciNet Google Scholar
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the stackelberg/nash equilibria using the extraproximal method: convergence analysis and implementation details for Markov chains games. Int. J. Appl. Math. Comput. Sci. 25(2), 337–351 (2015)
Article MathSciNet Google Scholar
Nash, J.F.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)
Article MathSciNet Google Scholar
Aumann, R.: Acceptable points in general cooperative n-person games. In: Contributions to the Theory of Games IV, volume 40 of Annals of Mathematics Study, pp. 287–324 (1959)
Ichiishi, T.: A social coalitional equilibrium existence lemma. Econometrica 49, 369–377 (1981)
Article MathSciNet Google Scholar
Guesnerie, R., Oddou, C.: Second best taxation as a game. J. Econ. Theory 60, 67–91 (1981)
Article MathSciNet Google Scholar
Greenberg, J., Weber, S.: Stable coalition structures with unidimensional set of alternatives. J. Econ. Theory 60, 62–82 (1993)
Article MathSciNet Google Scholar
Demange, G.: Intermediate preferences and stable coalition structures. J. Math. Econ. 23, 45–58 (1994)
Article MathSciNet Google Scholar
Konishi, H., Le Breton, M., Weber, S.: Equilibria in a model with partial rivalry. J. Econ. Theory 72, 225–237 (1997)
Article MathSciNet Google Scholar
Rozenfeld, O., Tennenholtz, M.: Strong and correlated strong equilibria in monotone congestion games. In: The 2nd Workshop on Internet and Network Economics (WINE 06), pp. 74–86 (2006)
Clempner, J.B., Poznyak, A.S.: Computing the strong nash equilibrium for Markov chains games. Appl. Math. Comput. 265, 911–927 (2015)
MathSciNet MATH Google Scholar
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: An optimal strong equilibirum solution for cooperative multi-leader-follower Stackelberg Markov chains games. Kibernetika 52(2), 258–279 (2016)
MathSciNet MATH Google Scholar
Clempner, J.B., Poznyak, A.S.: Simple computing of the customer lifetime value: a fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 23(4), 439–459 (2014)
Article Google Scholar
Clempner, J.B., Poznyak, A.S.: Solving the Pareto front for nonlinear multiobjective Markov chains using the minimum Euclidian distance optimization method. Math. Comput. Simul. 119, 142–160 (2016)
Article Google Scholar
Clempner, J.B., Poznyak, A.S.: Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach. Comput. Appl. Math. 37(1), 567–591 (2018)
Article MathSciNet Google Scholar
Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Dierential Equations. Yale University Press, New Haven (1923)
MATH Google Scholar
Clempner, J.B., Poznyak, A.S.: A Tikhonov regularized penalty function approach for solving polylinear programming problems. J. Comput. Appl. Math. 328, 267–286 (2018)
Article MathSciNet Google Scholar
Tikhonov, A.N., Goncharsky, A.V., Stepanov, V.V., Yagola, A.G.: Numerical Methods for the Solution of Ill-Posed Problems. Kluwer Academic Publishers, Berlin (1995)
Book Google Scholar
Clempner, J.B., Poznyak, A.S.: A Tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Eng. Optim. 50(11), 1996–2012 (2018)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Physics and Mathematics, National Polytechnic Institute, Edificio 9 U.P. Adolfo Lopez Mateos, Col. San Pedro Zacatenco, 07730, Mexico City, Mexico
Julio B. Clempner
Department of Control Automatics, Center for Research and Advanced Studies, Av. IPN 2508, Col. San Pedro Zacatenco, 07360, Mexico City, Mexico
Alexander S. Poznyak

Authors

Julio B. Clempner
View author publications
You can also search for this author in PubMed Google Scholar
Alexander S. Poznyak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julio B. Clempner.

Additional information

Communicated by Kyriakos G. Vamvoudakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Proof of Lemma 2.1

Proof

Indeed, any non-stationary bounded (defined on a compact set) policy c(n) by the Weierstrass theorem obligatory contains a convergent subsequent realizing the relations

$$\begin{aligned}&\underset{t\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n}V^{l}\left( c(t)\right) \le \underset{n\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n}\underset{k=1,\ldots ,t}{ \max }V^{l}\left( c(k)\right) \\&\quad \le \underset{t\rightarrow \infty }{\limsup }\frac{1}{n}\sum \limits _{t=1}^{n} \underset{k\rightarrow \infty }{\limsup } \, V^{l}\left( c(n_{k})\right) =V^{l}\left( c^{**}\right) , \end{aligned}$$

where $V^{l}\left( c\right) $ is assumed to be monotonically increasing functional of each component $c_{ik}^{l^{\prime }}$ when other ones are fixed, and $ \underset{k\rightarrow \infty }{\limsup }V^{l}\left( c(n_{k})\right) {:}{=}V^{l}\left( c^{**}\right) $. This upper bound is reached, taking $c(t)=$$c^{*}=c^{**}$ since

$$\begin{aligned} \underset{t\rightarrow \infty }{\limsup }\frac{1}{t}\sum \limits _{h=1}^{t}V^{l}\left( c^{*}\right) =V^{l}\left( c^{*}\right) =V^{l}\left( c^{**}\right) . \square \end{aligned}$$

Appendix B: Proof of Theorem 4.1

Proof

(a)
First, let us prove that the Hessian matrix $H{:}{=}\dfrac{\partial ^{2}}{\partial x\partial x^{\intercal }}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right) $ is strictly positive definite for all $x\in {\mathbb {R}}^{n}$ and for some positive $\theta $ and $\delta $, $H>0.$ We have
$$\begin{aligned} \dfrac{\partial ^{2}}{\partial x^{2}}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right)= & {} \theta \dfrac{\partial ^{2}}{\partial x^{2}} V^{l}(x)+\delta I_{N\times N}\\\ge & {} \delta \left( 1+\dfrac{\theta }{\delta }\zeta ^{-}\right) I_{N\times N}>0 \, {\ \forall } \, \delta >\theta \left| \zeta ^{-}\right| ,\\ \zeta ^{-}&{:=}&\underset{x\in X_{adm}}{\min }\zeta _{\min }\left( \dfrac{ \partial ^{2}}{\partial x^{2}}V^{l}(x)\right) , \end{aligned}$$
($\zeta _{\min }$ is the minimum eigenvalue) such that $H>0$ if $\delta >\theta \left| \zeta ^{-}\right| $. This means that RLF ( 11) is strongly convex on x and it has a unique minimal point defined below as $x^{*}$.
(b)
In view of the properties
$$\begin{aligned} \begin{array}{cc} \left( \nabla V^{l}\left( x\right) ,\left( y-x\right) \right) \le V^{l}\left( y\right) -V^{l}\left( x\right)&\, \left( \nabla V^{l}\left( x\right) ,\left( x-y\right) \right) \ge V^{l}\left( x\right) -V^{l}\left( y\right) , \end{array} \end{aligned}$$
valid for any convex function $V^{l}\left( x\right) $ and any x, y, for RLF at any admissible points x,$\mu _{0},\mu _{1}$, and $x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) $, $\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,$$\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) $, we have
$$\begin{aligned}&\left( x-x_{t}^{*},\dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{n},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) - \left( \mu _{0}-\mu _{0,t}^{*},\dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}} _{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) \nonumber \\&\quad -\left( \mu _{1}-\mu _{1,t}^{*},\dfrac{\partial }{\partial \mu _{1}} {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) = {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0,t}^{*},\mu _{1,t}^{*}\right) \nonumber \\&\quad -{\mathcal {L}}_{\theta ,\delta }\left( x_{t}^{*},\mu _{0},\mu _{1}\right) + \, \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) , \nonumber \\ \end{aligned}$$
(30)
which by the saddle-point condition Eq. (14) implies
$$\begin{aligned}&\theta _{t}\left( x-x_{t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial x}V^{l}\left( x\right) +\left( x-x_{t}^{*}\right) ^{\intercal } \left[ A_\mathrm{eq}^{\intercal }\mu _{0}+A_\mathrm{ineq}^{\intercal }\mu _{1}+\delta _{t}x \right] \nonumber \\&\quad +\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{eq}x+b_\mathrm{eq}\right) +\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{ineq}x+b_\mathrm{ineq}\right) \nonumber \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*},\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) . \end{aligned}$$
(31)
(c)
Selecting in Eq. (31) $x{:}{=}x^{*}\in X^{*}$, $\mu _{0}=\mu _{0}^{*},$$\mu _{1}=\mu _{1}^{*}$, and the complementary slackness conditions $ \left( \mu _{1}^{*}\right) _{i}\left( A_\mathrm{ineq}x^{*}-b_\mathrm{ineq}\right) _{i}=\left( \mu _{1,t}^{*}\right) _{i}\left( A_\mathrm{ineq}x_{t}^{*}-b_\mathrm{ineq}\right) _{i}=0 $, we obtain
$$\begin{aligned}&\theta _{t}\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) +\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\left[ A_\mathrm{eq}^{\intercal }\mu _{0}^{*}+A_\mathrm{ineq}^{\intercal }\mu _{1}^{*}+\delta _{t}x^{*}\right] \\&\qquad +\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{n}\mu _{0}^{*}-A_\mathrm{eq}x^{*}+b_\mathrm{eq}\right) \\&\qquad + \left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}\mu _{1}^{*}-A_\mathrm{ineq}x^{*}+b_\mathrm{ineq}\right) \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x^{*}-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}^{*}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}-\mu _{1,t}^{*}\right\| ^{2}\right) \ge 0. \end{aligned}$$
Simplifying the last inequality, we have
$$\begin{aligned} \begin{array}{c} \theta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) \text {+}\delta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }x^{*}\text {+}\delta _{t}\left( \mu _{0}^{*}\text {-}\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}\text {+} \left( \mu _{1}^{*}\text {-}\mu _{1,t}^{*}\right) ^{\intercal }\delta _{t}\mu _{1}^{*}\!\ge \! 0. \end{array} \end{aligned}$$
Dividing both sides of this inequality by $\delta _{t}$ and taking $\dfrac{ \theta _{t}}{\delta _{t}}\underset{t\rightarrow \infty }{\rightarrow }0$, we get
$$\begin{aligned} 0\le \,\underset{t\rightarrow \infty }{\limsup }\left[ \left( x^{*}-x_{t}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\mu _{1}^{*}\right] . \end{aligned}$$
(32)
Then, there exist subsequences $\delta _{k}$ and $ \theta _{k}$$\left( k\rightarrow \infty \right) $ on which there exist the limits
$$\begin{aligned}&x_{k}^{*}=x^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{x}}^{*}, \, \\&\mu _{0,k}^{*}=\mu _{0}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{0}^{*}, \\&\mu _{1,k}^{*}=\mu _{1}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{1}^{*}\text { {\ as} }k\rightarrow \infty . \end{aligned}$$
Suppose that there exist two limit points for two different convergent subsequences, i.e., there exist the limits
$$\begin{aligned}&x_{k^{\prime }}^{*}=x^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow \bar{x}^{*}, \, \\&\mu _{0,k^{\prime }}^{*}=\mu _{0}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{0}^{*}, \\&\mu _{1,k^{\prime }}^{*}=\mu _{1}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{1}^{*}\text { {\ as } }k\rightarrow \infty . \end{aligned}$$
Then, on these subsequences one has
$$\begin{aligned} \begin{array}{c} 0\le \left( x^{*}-{\tilde{x}}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\tilde{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\tilde{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}, \\ 0\le \left( x^{*}-\bar{x}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\bar{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\bar{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}. \end{array} \end{aligned}$$
It follows that points $\left( {\tilde{x}}^{*}, {\tilde{\mu }}_{0}^{*},{\tilde{\mu }}_{1}^{*}\right) $ and $\left( \bar{x} ^{*},{\bar{\mu }}_{0}^{*},{\bar{\mu }}_{1}^{*}\right) $ correspond to the minimum point of the function $ s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) {:}{=}\dfrac{1}{2} \left( \left\| x^{*}\right\| ^{2}+\left\| \mu _{0}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}\right\| ^{2}\right) $ defined on $X^{*}\otimes \Lambda ^{*}$ for all possible saddle-points of the non-regularized Lagrange function. But $ s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) $ is strictly convex, and its minimum is unique that gives ${\tilde{x}}^{*}$$=$$\bar{x}^{*},$${\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*},$${\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*}$. $\square $

Appendix C: Proof of Lemma 4.1

Proof

It follows from Eq. (30) for the points $x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) $, $\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,$$\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) $ to be the extremal points of the function ${\mathcal {L}}_{\theta _{t},\delta _{n}}\left( x,\mu _{0},\mu _{1}\right) $. $\square $

Appendix D: Proof of Theorem 4.2

Proof

In view of Eq. (21), it follows

$$\begin{aligned}&F_{t+1}\le F_{t}+\left\| x_{t}^{*}-x_{t+1}^{*}\right\| ^{2}+\left\| \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right\| ^{2}+\left\| \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right\| ^{2} \nonumber \\&\quad +\gamma _{t}^{2}\left\| \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}+\gamma _{t}^{2}\nonumber \\&\qquad \left\| \dfrac{\partial }{\partial \mu _{0}}\mathcal {L }_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2} \nonumber \\&\quad +\,\gamma _{t}^{2}\left\| \dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}} _{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}- 2\gamma _{t}\nonumber \\&\qquad \left( x_{t}-x_{t}^{*}\right) ^{\intercal } \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{1,t}-\mu _{1,t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( x_{t}^{*}-x_{t+1}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\, 2\gamma _{t}\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{n},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\,2\gamma _{t}\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}} \left( x_{t},\mu _{0,t},\mu _{1,t}\right) \nonumber \\&\quad +\,2\left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) +2\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) \nonumber \\&\quad +\,2\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) . \end{aligned}$$

(33)

For strongly convex (concave) functions, the following inequalities hold

$$\begin{aligned} \begin{array}{c} \left( x_{t}-x_{t}^{*}\right) ^{\intercal }\dfrac{\partial }{\partial x} {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \ge \delta _{t}\left( 1+\dfrac{\theta _{t}}{\delta _{t}}\zeta ^{-}\right) \left\| x-x_{t}^{*}\right\| ^{2} ,\\ \delta _{t}\left( 1-\epsilon \right) \left\| x-x_{t}^{*}\right\| ^{2}, \, \left| \dfrac{\theta _{t}}{\delta _{t}}F^{-}\right| \le \epsilon , \\ \left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le -\delta _{t}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}, \\ \left( \mu _{0,t}-\mu _{0,t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le -\delta _{t}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}. \end{array} \end{aligned}$$

By the Lipschitz property for the gradients of ${\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) $, we also have

$$\begin{aligned}&\left\| \dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{x}\left\| x-x_{t}^{*}\right\| ^{2}, \\&\left\| \dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{\mu _{0}}\left\| \mu _{0,t}-\mu _{0,t}^{*}\right\| ^{2}, \\&\left\| \dfrac{\partial }{\partial \mu _{1}}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \right\| ^{2}\le L_{\mu _{1}}\left\| \mu _{1,t}-\mu _{1,t}^{*}\right\| ^{2}. \end{aligned}$$

By the $\Lambda $-inequality $2\left( a,b\right) \le \left( a,\Lambda a\right) +\left( b,\Lambda ^{-1}b\right) $ valid for any vectors a, b and any matrix $\Lambda >0,$ we get for $\Lambda =I_{n\times n}$

$$\begin{aligned} \begin{array}{c} 2\gamma _{t}\left( x_{t}^{*}-x_{t+1}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}{\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \le \left( 1+L_{\mu _{1}}\right) \gamma _{t}\left\| \mu _{1,t}-\mu _{1,t}^{*}\right\| ^{2}, \end{array} \end{aligned}$$

and for $\Lambda =\varepsilon _{t}I$

$$\begin{aligned}&\left| \left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) \right| \le \varepsilon _{t}\left\| x_{t}-x_{t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| x_{t}^{*}-x_{t+1}^{*}\right\| ^{2} \\&\quad \left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) \le \varepsilon _{t}\left\| \mu _{0}-\mu _{0,t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right\| ^{2} \\&\quad \left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) \le \varepsilon _{t}\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}+\varepsilon _{t}^{-1}\left\| \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right\| ^{2}, \end{aligned}$$

which lead to the following estimate

$$\begin{aligned} \begin{array}{c} 2\left( x_{t}-x_{t}^{*}\right) ^{\intercal }\left( x_{t}^{*}-x_{t+1}^{*}\right) +2\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \mu _{0,t}^{*}-\mu _{0,t+1}^{*}\right) +2\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\cdot \\ \left( \mu _{1,t}^{*}-\mu _{1,t+1}^{*}\right) \le \varepsilon _{t}F_{t}+2\varepsilon _{t}^{-1}\left( C_{\theta }^{2}\left| \theta _{t}-\theta _{m}\right| ^{2}+C_{\delta }^{2}\left| \delta _{n}-\delta _{m}\right| ^{2}\right) . \end{array} \end{aligned}$$

Replacing into Eq. (33) implies ($ L{:}{=}\max \left\{ L_{x},L_{\mu _{0}},L_{\mu _{1}}\right\} ,$$C{:}{=}4\max \left\{ C_{\theta }^{2},C_{\delta }^{2}\right\} $)

$$\begin{aligned}&F_{t+1}\le W_{t}\left[ 1-2\gamma _{t}\delta _{t}\left( 1-\epsilon \right) \left( 1- \dfrac{1}{1-\epsilon }\dfrac{\varepsilon _{t}}{\delta _{t}}-\dfrac{1}{ 2\left( 1-\epsilon \right) }\dfrac{\varepsilon _{t}}{\gamma _{t}\delta _{t}}- \dfrac{\left( 1+L\right) }{2\left( 1-\epsilon \right) }\dfrac{\gamma _{t}}{ \delta _{t}}\right) \right] \\&\quad + \, C\gamma _{t}\varepsilon _{t}^{-1}\left( \left| \theta _{t}-\theta _{m}\right| ^{2}+\left| \delta _{t}-\delta _{m}\right| ^{2}\right) . \end{aligned}$$

If a nonnegative $\left\{ u_{t}\right\} $ sequence satisfies the recurrent inequality

$$\begin{aligned} \begin{array}{c} u_{t+1}\le u_{t}\left( 1-\alpha _{t}\right) +\beta _{t}, \, 0<\alpha _{t}\le 1, \, \sum \limits _{t=0}^{\infty }\alpha _{t}=\infty , \, \dfrac{\beta _{t}}{\alpha _{t}}\underset{t\rightarrow \infty }{\rightarrow }p, \end{array} \end{aligned}$$

then $u_{t}\underset{t\rightarrow \infty }{\rightarrow }p$. Defining

$$\begin{aligned} \alpha _{t}&{:=}&2\gamma _{t}\delta _{t}\left( 1-\epsilon \right) \left( 1- \dfrac{1}{1-\epsilon }\dfrac{\varepsilon _{t}}{\delta _{t}}-\dfrac{1}{ 2\left( 1-\epsilon \right) }\dfrac{\varepsilon _{t}}{\gamma _{t}\delta _{t}}- \dfrac{\left( 1+L\right) }{2\left( 1-\epsilon \right) }\dfrac{\gamma _{t}}{ \delta _{t}}\right) , \\ \beta _{t}&{:=}&C\gamma _{t}\varepsilon _{t}^{-1}\left( \left| \theta _{t}-\theta _{t+1}\right| ^{2}+\left| \delta _{t}-\delta _{t+1}\right| ^{2}\right) , \end{aligned}$$

and applying Eq. (23) of this theorem for $p=0$, we obtain the desired result. $\square $

Appendix E: Description of the Newton Method

To find $\lambda _{\delta }^{**}$, let us apply Newton’s optimization method related to the following procedure

$$\begin{aligned} \begin{array}{c} \lambda _{t+1}={\mathrm {Pr}}_{\Delta ^{n}}\left[ \lambda _{t}-\gamma _{t} \left[ \Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) +\varepsilon \right] ^{-1}\Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) \right] , \, \\ {\lambda }_{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \, \gamma _{t}>0, \, \sum \limits _{t=0}^{\infty }\gamma _{t}=\infty , \end{array} \end{aligned}$$

where ${\mathrm {Pr}}_{\Delta ^{n}}$ is the projection operator into the simplex. The derivative $\Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) $ is given by

$$\begin{aligned} \begin{array}{c} \Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) =\frac{\hbox {d}}{ \hbox {d}\lambda }\Phi _{\theta ,\delta }\left( \lambda _{t}\right) =\delta \left( 2\lambda _{t}-1\right) +\theta \sum \limits _{l=1}^{n}\dfrac{\hbox {d}}{\hbox {d}\lambda } V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \cdot \\ \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+} \right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) , \end{array} \end{aligned}$$

where the terms $\dfrac{\hbox {d}}{\hbox {d}\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) $ may be approximated by the Euler method as

$$\begin{aligned} \dfrac{\hbox {d}}{\hbox {d}\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \simeq \varepsilon ^{-1}\left[ V^{l}\left( x^{*}\left( \lambda _{t}+\varepsilon \right) \right) -V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] {:}{=}\Psi \left( \lambda _{t},\varepsilon \right) , \, 0<\varepsilon \ll 1, \end{aligned}$$

and the second derivative $\Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) $ for two players is given by

$$\begin{aligned}&\Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) = 2\delta +\theta \sum \limits _{l=1}^{n}\left[ \dfrac{d^{2}}{d\lambda ^{2}} V^{l}\right. \\&\qquad \left. \left( x^{*}\left( \lambda _{t}\right) \right) \right] \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{n}\right) \right) \right] _{+}^{1+\varepsilon }\right) \\&\quad +\,\theta \left( 1+\varepsilon \right) \sum \limits _{l=1}^{n}\left[ \dfrac{d}{ d\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] ^{2} \\&\qquad \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{\varepsilon }-\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{\varepsilon }\right) , \end{aligned}$$

where the terms $\dfrac{\hbox {d}^{2}}{\hbox {d}\lambda ^{2}}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) $ may be approximated by the Euler method as

$$\begin{aligned}&\dfrac{\hbox {d}^{2}}{\hbox {d}\lambda ^{2}}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \simeq \frac{1}{\varepsilon ^{2}}\sum \limits _{l=1}^{n}\left[ V^{l}\right. \\&\quad \left. \left( x^{*}\left( \lambda _{t}+2\varepsilon \right) \right) -2V^{l}\left( x^{*}\left( \lambda _{t}+\varepsilon \right) \right) +V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] , \, 0<\varepsilon \ll 1. \end{aligned}$$

Finally, the suggested numerical procedure with $\Gamma _{t}=\gamma $ for finding $\lambda _{\delta }^{**}$ for the first derivative

$$\begin{aligned} {\lambda }_{t+1}= & {} {\mathrm {Pr}}_{\Delta ^{n}}\left\{ {\lambda }_{t}-\gamma _{t}\nabla {\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right\} , \, {\lambda } _{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \\&\quad \nabla {\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda } _{t}\right) {:}{=}\left( \frac{\partial }{\partial \lambda _{1}}{\tilde{\Phi }} _{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\frac{ \partial }{\partial \lambda _{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\ldots ,\frac{\partial }{\partial \lambda _{N}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\right. \\&\quad \left. \left( {\lambda } _{t}\right) \right) ^{\intercal }, \text {where }\frac{\partial }{\partial \lambda _{i}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) {:}{=}\varepsilon ^{-1}\sum \limits _{l=1}^{n}\left[ \nabla _{i}V^{l}\left( {\lambda } _{t},\varepsilon \right) \right] \cdot \\&\quad \left( \left[ V^{l}\left( x^{*}\left( {\lambda }_{t}\right) \right) -V^{l+}\right] _{+}^{1+\varepsilon }-\left[ V^{l-}-V^{l}\left( x^{*}\left( {\lambda }_{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) +\frac{\delta }{2}\frac{\partial }{\partial \lambda _{i}}\left\| {\lambda }\right\| ^{2}, \end{aligned}$$

and for the second derivative

$$\begin{aligned}&{\lambda }_{t+1}={\mathrm {Pr}}_{\Delta ^{n}}\left\{ {\lambda } _{t}-\gamma _{t}\nabla ^{2}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right\} , \, {\lambda } _{0}=(1/t,\ldots 1/t), \, t=0,1,\ldots , \\&\nabla ^{2}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) :=\left( \frac{\partial }{\partial \lambda _{1}^{2}}{\tilde{\Phi }} _{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\frac{ \partial }{\partial \lambda _{2}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) ,\ldots ,\frac{\partial }{ \partial \lambda _{N}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) \right) ^{\intercal },\\&\text {where }\frac{\partial }{\partial \lambda _{i}^{2}}{\tilde{\Phi }}_{\theta ,\delta ,\varepsilon }\left( {\lambda }_{t}\right) := 2\delta +\theta \sum \limits _{l=1}^{n}\left[ \nabla _{i}^{2}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] \cdot \\&\quad \left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+} \right] _{+}^{1+\varepsilon }+\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{1+\varepsilon }\right) \\&\qquad +\,\theta \left( 1+\varepsilon \right) \sum \limits _{l=1}^{n}\left[ \nabla _{i}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] ^{2}\left( \left[ V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) -V^{l+}\right] _{+}^{\varepsilon }\right. \\&\qquad \left. -\,\left[ V^{l-}-V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \right] _{+}^{\varepsilon }\right) .\square \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Clempner, J.B., Poznyak, A.S. Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games. J Optim Theory Appl 186, 1029–1052 (2020). https://doi.org/10.1007/s10957-020-01729-3

Download citation

Received: 13 June 2018
Accepted: 22 July 2020
Published: 07 August 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10957-020-01729-3

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games

Abstract

Access this article

Similar content being viewed by others

A Proximal/Gradient Approach for Computing the Nash Equilibrium in Controllable Markov Games

Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach

On Some Approaches to Find Nash Equilibrium in Concave Games

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 2.1

Proof

Appendix B: Proof of Theorem 4.1

Proof

Appendix C: Proof of Lemma 4.1

Proof

Appendix D: Proof of Theorem 4.2

Proof

Appendix E: Description of the Newton Method

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games

Abstract

Access this article

Similar content being viewed by others

A Proximal/Gradient Approach for Computing the Nash Equilibrium in Controllable Markov Games

Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach

On Some Approaches to Find Nash Equilibrium in Concave Games

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proof of Lemma 2.1

Proof

Appendix B: Proof of Theorem 4.1

Proof

Appendix C: Proof of Lemma 4.1

Proof

Appendix D: Proof of Theorem 4.2

Proof

Appendix E: Description of the Newton Method

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation