Abstract
This paper suggests a procedure to construct the Pareto frontier and efficiently computes the strong Nash equilibrium for a class of time-discrete ergodic controllable Markov chain games. The procedure finds the strong Nash equilibrium, using the Newton optimization method presenting a potential advantage for ill-conditioned problems. We formulate the solution of the problem based on the Lagrange principle, adding a Tikhonov’s regularization parameter for ensuring both the strict convexity of the Pareto frontier and the existence of a unique strong Nash equilibrium. Then, any welfare optimum arises as a strong Nash equilibrium of the game. We prove the existence and characterization of the strong Nash equilibrium, which is one of the main results of this paper. The method is validated theoretically and illustrated with an application example.
Similar content being viewed by others
References
Clempner, J.B., Poznyak, A.S.: Convergence method, properties and computational complexity for Lyapunov games. Int. J. Appl. Math. Comput. Sci. 21(2), 349–361 (2011)
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: Computing the stackelberg/nash equilibria using the extraproximal method: convergence analysis and implementation details for Markov chains games. Int. J. Appl. Math. Comput. Sci. 25(2), 337–351 (2015)
Nash, J.F.: Non-cooperative games. Ann. Math. 54, 286–295 (1951)
Aumann, R.: Acceptable points in general cooperative n-person games. In: Contributions to the Theory of Games IV, volume 40 of Annals of Mathematics Study, pp. 287–324 (1959)
Ichiishi, T.: A social coalitional equilibrium existence lemma. Econometrica 49, 369–377 (1981)
Guesnerie, R., Oddou, C.: Second best taxation as a game. J. Econ. Theory 60, 67–91 (1981)
Greenberg, J., Weber, S.: Stable coalition structures with unidimensional set of alternatives. J. Econ. Theory 60, 62–82 (1993)
Demange, G.: Intermediate preferences and stable coalition structures. J. Math. Econ. 23, 45–58 (1994)
Konishi, H., Le Breton, M., Weber, S.: Equilibria in a model with partial rivalry. J. Econ. Theory 72, 225–237 (1997)
Rozenfeld, O., Tennenholtz, M.: Strong and correlated strong equilibria in monotone congestion games. In: The 2nd Workshop on Internet and Network Economics (WINE 06), pp. 74–86 (2006)
Clempner, J.B., Poznyak, A.S.: Computing the strong nash equilibrium for Markov chains games. Appl. Math. Comput. 265, 911–927 (2015)
Trejo, K.K., Clempner, J.B., Poznyak, A.S.: An optimal strong equilibirum solution for cooperative multi-leader-follower Stackelberg Markov chains games. Kibernetika 52(2), 258–279 (2016)
Clempner, J.B., Poznyak, A.S.: Simple computing of the customer lifetime value: a fixed local-optimal policy approach. J. Syst. Sci. Syst. Eng. 23(4), 439–459 (2014)
Clempner, J.B., Poznyak, A.S.: Solving the Pareto front for nonlinear multiobjective Markov chains using the minimum Euclidian distance optimization method. Math. Comput. Simul. 119, 142–160 (2016)
Clempner, J.B., Poznyak, A.S.: Constructing the Pareto front for multi-objective Markov chains handling a strong Pareto policy approach. Comput. Appl. Math. 37(1), 567–591 (2018)
Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Dierential Equations. Yale University Press, New Haven (1923)
Clempner, J.B., Poznyak, A.S.: A Tikhonov regularized penalty function approach for solving polylinear programming problems. J. Comput. Appl. Math. 328, 267–286 (2018)
Tikhonov, A.N., Goncharsky, A.V., Stepanov, V.V., Yagola, A.G.: Numerical Methods for the Solution of Ill-Posed Problems. Kluwer Academic Publishers, Berlin (1995)
Clempner, J.B., Poznyak, A.S.: A Tikhonov regularization parameter approach for solving lagrange constrained optimization problems. Eng. Optim. 50(11), 1996–2012 (2018)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Kyriakos G. Vamvoudakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Proof of Lemma 2.1
Proof
Indeed, any non-stationary bounded (defined on a compact set) policy c(n) by the Weierstrass theorem obligatory contains a convergent subsequent realizing the relations
where \(V^{l}\left( c\right) \) is assumed to be monotonically increasing functional of each component \(c_{ik}^{l^{\prime }}\) when other ones are fixed, and \( \underset{k\rightarrow \infty }{\limsup }V^{l}\left( c(n_{k})\right) {:}{=}V^{l}\left( c^{**}\right) \). This upper bound is reached, taking \(c(t)=\)\(c^{*}=c^{**}\) since
Appendix B: Proof of Theorem 4.1
Proof
-
(a)
First, let us prove that the Hessian matrix \(H{:}{=}\dfrac{\partial ^{2}}{\partial x\partial x^{\intercal }}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right) \) is strictly positive definite for all \(x\in {\mathbb {R}}^{n}\) and for some positive \(\theta \) and \(\delta \), \(H>0.\) We have
$$\begin{aligned} \dfrac{\partial ^{2}}{\partial x^{2}}{\mathcal {L}}_{\theta ,\delta }\left( x,\mu _{0},\mu _{1}\right)= & {} \theta \dfrac{\partial ^{2}}{\partial x^{2}} V^{l}(x)+\delta I_{N\times N}\\\ge & {} \delta \left( 1+\dfrac{\theta }{\delta }\zeta ^{-}\right) I_{N\times N}>0 \, {\ \forall } \, \delta >\theta \left| \zeta ^{-}\right| ,\\ \zeta ^{-}&{:=}&\underset{x\in X_{adm}}{\min }\zeta _{\min }\left( \dfrac{ \partial ^{2}}{\partial x^{2}}V^{l}(x)\right) , \end{aligned}$$(\(\zeta _{\min }\) is the minimum eigenvalue) such that \(H>0\) if \(\delta >\theta \left| \zeta ^{-}\right| \). This means that RLF ( 11) is strongly convex on x and it has a unique minimal point defined below as \(x^{*}\).
-
(b)
In view of the properties
$$\begin{aligned} \begin{array}{cc} \left( \nabla V^{l}\left( x\right) ,\left( y-x\right) \right) \le V^{l}\left( y\right) -V^{l}\left( x\right)&\, \left( \nabla V^{l}\left( x\right) ,\left( x-y\right) \right) \ge V^{l}\left( x\right) -V^{l}\left( y\right) , \end{array} \end{aligned}$$valid for any convex function \(V^{l}\left( x\right) \) and any x, y, for RLF at any admissible points x,\(\mu _{0},\mu _{1}\), and \(x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) \), \(\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,\)\(\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) \), we have
$$\begin{aligned}&\left( x-x_{t}^{*},\dfrac{\partial }{\partial x}{\mathcal {L}}_{\theta _{n},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) - \left( \mu _{0}-\mu _{0,t}^{*},\dfrac{\partial }{\partial \mu _{0}}{\mathcal {L}} _{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) \nonumber \\&\quad -\left( \mu _{1}-\mu _{1,t}^{*},\dfrac{\partial }{\partial \mu _{1}} {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0},\mu _{1}\right) \right) = {\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x,\mu _{0,t}^{*},\mu _{1,t}^{*}\right) \nonumber \\&\quad -{\mathcal {L}}_{\theta ,\delta }\left( x_{t}^{*},\mu _{0},\mu _{1}\right) + \, \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) , \nonumber \\ \end{aligned}$$(30)which by the saddle-point condition Eq. (14) implies
$$\begin{aligned}&\theta _{t}\left( x-x_{t}^{*}\right) ^{\intercal }\dfrac{\partial }{ \partial x}V^{l}\left( x\right) +\left( x-x_{t}^{*}\right) ^{\intercal } \left[ A_\mathrm{eq}^{\intercal }\mu _{0}+A_\mathrm{ineq}^{\intercal }\mu _{1}+\delta _{t}x \right] \nonumber \\&\quad +\left( \mu _{0}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{eq}x+b_\mathrm{eq}\right) +\left( \mu _{1}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}-A_\mathrm{ineq}x+b_\mathrm{ineq}\right) \nonumber \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}-\mu _{0,t}^{*},\right\| ^{2}+\left\| \mu _{1}-\mu _{1,t}^{*}\right\| ^{2}\right) . \end{aligned}$$(31) -
(c)
Selecting in Eq. (31) \(x{:}{=}x^{*}\in X^{*}\), \(\mu _{0}=\mu _{0}^{*},\)\(\mu _{1}=\mu _{1}^{*}\), and the complementary slackness conditions \( \left( \mu _{1}^{*}\right) _{i}\left( A_\mathrm{ineq}x^{*}-b_\mathrm{ineq}\right) _{i}=\left( \mu _{1,t}^{*}\right) _{i}\left( A_\mathrm{ineq}x_{t}^{*}-b_\mathrm{ineq}\right) _{i}=0 \), we obtain
$$\begin{aligned}&\theta _{t}\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) +\left( x^{*}-x_{t}^{*}\right) ^{\intercal }\left[ A_\mathrm{eq}^{\intercal }\mu _{0}^{*}+A_\mathrm{ineq}^{\intercal }\mu _{1}^{*}+\delta _{t}x^{*}\right] \\&\qquad +\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\left( \delta _{n}\mu _{0}^{*}-A_\mathrm{eq}x^{*}+b_\mathrm{eq}\right) \\&\qquad + \left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\left( \delta _{t}\mu _{1}^{*}-A_\mathrm{ineq}x^{*}+b_\mathrm{ineq}\right) \\&\quad \ge \dfrac{\delta _{t}}{2}\left( \left\| x^{*}-x_{t}^{*}\right\| ^{2}+\left\| \mu _{0}^{*}-\mu _{0,t}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}-\mu _{1,t}^{*}\right\| ^{2}\right) \ge 0. \end{aligned}$$Simplifying the last inequality, we have
$$\begin{aligned} \begin{array}{c} \theta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }\dfrac{ \partial }{\partial x}V^{l}\left( x^{*}\right) \text {+}\delta _{t}\left( x^{*}\text {-}x_{t}^{*}\right) ^{\intercal }x^{*}\text {+}\delta _{t}\left( \mu _{0}^{*}\text {-}\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}\text {+} \left( \mu _{1}^{*}\text {-}\mu _{1,t}^{*}\right) ^{\intercal }\delta _{t}\mu _{1}^{*}\!\ge \! 0. \end{array} \end{aligned}$$Dividing both sides of this inequality by \(\delta _{t}\) and taking \(\dfrac{ \theta _{t}}{\delta _{t}}\underset{t\rightarrow \infty }{\rightarrow }0\), we get
$$\begin{aligned} 0\le \,\underset{t\rightarrow \infty }{\limsup }\left[ \left( x^{*}-x_{t}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-\mu _{0,t}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-\mu _{1,t}^{*}\right) ^{\intercal }\mu _{1}^{*}\right] . \end{aligned}$$(32)Then, there exist subsequences \(\delta _{k}\) and \( \theta _{k}\)\(\left( k\rightarrow \infty \right) \) on which there exist the limits
$$\begin{aligned}&x_{k}^{*}=x^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{x}}^{*}, \, \\&\mu _{0,k}^{*}=\mu _{0}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{0}^{*}, \\&\mu _{1,k}^{*}=\mu _{1}^{*}\left( \theta _{k},\delta _{k}\right) \rightarrow {\tilde{\mu }}_{1}^{*}\text { {\ as} }k\rightarrow \infty . \end{aligned}$$Suppose that there exist two limit points for two different convergent subsequences, i.e., there exist the limits
$$\begin{aligned}&x_{k^{\prime }}^{*}=x^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow \bar{x}^{*}, \, \\&\mu _{0,k^{\prime }}^{*}=\mu _{0}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{0}^{*}, \\&\mu _{1,k^{\prime }}^{*}=\mu _{1}^{*}\left( \theta _{k^{\prime }},\delta _{k^{\prime }}\right) \rightarrow {\bar{\mu }}_{1}^{*}\text { {\ as } }k\rightarrow \infty . \end{aligned}$$Then, on these subsequences one has
$$\begin{aligned} \begin{array}{c} 0\le \left( x^{*}-{\tilde{x}}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\tilde{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\tilde{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}, \\ 0\le \left( x^{*}-\bar{x}^{*}\right) ^{\intercal }x^{*}+\left( \mu _{0}^{*}-{\bar{\mu }}_{0}^{*}\right) ^{\intercal }\mu _{0}^{*}+\left( \mu _{1}^{*}-{\bar{\mu }}_{1}^{*}\right) ^{\intercal }\mu _{1}^{*}. \end{array} \end{aligned}$$It follows that points \(\left( {\tilde{x}}^{*}, {\tilde{\mu }}_{0}^{*},{\tilde{\mu }}_{1}^{*}\right) \) and \(\left( \bar{x} ^{*},{\bar{\mu }}_{0}^{*},{\bar{\mu }}_{1}^{*}\right) \) correspond to the minimum point of the function \( s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) {:}{=}\dfrac{1}{2} \left( \left\| x^{*}\right\| ^{2}+\left\| \mu _{0}^{*}\right\| ^{2}+\left\| \mu _{1}^{*}\right\| ^{2}\right) \) defined on \(X^{*}\otimes \Lambda ^{*}\) for all possible saddle-points of the non-regularized Lagrange function. But \( s\left( x^{*},\mu _{0}^{*},\mu _{1}^{*}\right) \) is strictly convex, and its minimum is unique that gives \({\tilde{x}}^{*}\)\(=\)\(\bar{x}^{*},\)\({\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*},\)\({\tilde{\mu }}_{0}^{*}={\bar{\mu }}_{0}^{*}\). \(\square \)
Appendix C: Proof of Lemma 4.1
Proof
It follows from Eq. (30) for the points \(x_{t}^{*}=x^{*}\left( \theta _{t},\delta _{t}\right) \), \(\mu _{0,t}^{*}=\mu _{0}^{*}\left( \theta _{t},\delta _{t}\right) ,\)\(\mu _{1,t}^{*}=\mu _{1}^{*}\left( \theta _{t},\delta _{t}\right) \) to be the extremal points of the function \({\mathcal {L}}_{\theta _{t},\delta _{n}}\left( x,\mu _{0},\mu _{1}\right) \). \(\square \)
Appendix D: Proof of Theorem 4.2
Proof
In view of Eq. (21), it follows
For strongly convex (concave) functions, the following inequalities hold
By the Lipschitz property for the gradients of \({\mathcal {L}}_{\theta _{t},\delta _{t}}\left( x_{t},\mu _{0,t},\mu _{1,t}\right) \), we also have
By the \(\Lambda \)-inequality \(2\left( a,b\right) \le \left( a,\Lambda a\right) +\left( b,\Lambda ^{-1}b\right) \) valid for any vectors a, b and any matrix \(\Lambda >0,\) we get for \(\Lambda =I_{n\times n}\)
and for \(\Lambda =\varepsilon _{t}I\)
which lead to the following estimate
Replacing into Eq. (33) implies (\( L{:}{=}\max \left\{ L_{x},L_{\mu _{0}},L_{\mu _{1}}\right\} ,\)\(C{:}{=}4\max \left\{ C_{\theta }^{2},C_{\delta }^{2}\right\} \))
If a nonnegative \(\left\{ u_{t}\right\} \) sequence satisfies the recurrent inequality
then \(u_{t}\underset{t\rightarrow \infty }{\rightarrow }p\). Defining
and applying Eq. (23) of this theorem for \(p=0\), we obtain the desired result. \(\square \)
Appendix E: Description of the Newton Method
To find \(\lambda _{\delta }^{**}\), let us apply Newton’s optimization method related to the following procedure
where \({\mathrm {Pr}}_{\Delta ^{n}}\) is the projection operator into the simplex. The derivative \(\Phi _{\theta ,\delta }^{^{\prime }}\left( \lambda _{t}\right) \) is given by
where the terms \(\dfrac{\hbox {d}}{\hbox {d}\lambda }V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \) may be approximated by the Euler method as
and the second derivative \(\Phi _{\theta ,\delta }^{^{\prime \prime }}\left( \lambda _{t}\right) \) for two players is given by
where the terms \(\dfrac{\hbox {d}^{2}}{\hbox {d}\lambda ^{2}}V^{l}\left( x^{*}\left( \lambda _{t}\right) \right) \) may be approximated by the Euler method as
Finally, the suggested numerical procedure with \(\Gamma _{t}=\gamma \) for finding \(\lambda _{\delta }^{**}\) for the first derivative
and for the second derivative
Rights and permissions
About this article
Cite this article
Clempner, J.B., Poznyak, A.S. Finding the Strong Nash Equilibrium: Computation, Existence and Characterization for Markov Games. J Optim Theory Appl 186, 1029–1052 (2020). https://doi.org/10.1007/s10957-020-01729-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01729-3