Summary
In this paper we study three finite state, value and policy iteration algorithms for denumerable space Markov decision processes with respect to the average cost criterion. The convergence of these algorithms is guaranteed under a scrambling-type recurrency condition and various “tail” conditions on the transition probabilities. With the value iteration schemes we construct nearly optimal policies by concentrating on a finite set of “important” states and controlling them as well as we can. The policy space algorithm consists of a value determination scheme associated with a policy and a policy improvement step where a “better” policy is determined. Thus a sequence of improved policies is constructed which is shown to converge to the optimal average cost policy.
Zusammenfassung
Für Markovsche Entscheidungsprozesse mit abzählbarem Zustandsraum untersuchen wir für den Fall des Durchschnittskostenkriteriums drei endliche Wertiterations- und Politikiterations-Algorithmen. Die Konvergenz der Algorithmen wird durch “scramblingtype” Rekurrenzbedingungen und verschiene “tail” Bedingungen an die Übergangswahrscheinlichkeiten gesichert. Mit den Wertiterationsverfahren konstruieren wir fast optimale Politiken, indem wir uns auf eine endliche Menge von “wichtigen” Zuständen konzentrieren und diese bestmöglich kontrollieren. Der Politikiterations-Algorithmus besteht aus einem Schritt zur Wertbestimmung für eine Politik und einem Schritt zur Verbesserung der Politik. Auf diese Weise wird eine Folge verbesserter Politiken konstruiert, die Konvergenz zur optimalen Politik wird gezeigt.
Similar content being viewed by others
References
Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177
Federgruen A, Schweitzer PJ, Tijms HC (1978) Contraction mappings underlying undiscounted Markov decision problems. JMAA 65:711–730
Federgruen A, Tijms HC (1978) The optimality equation in average cost denumerable state semi-Markov decision problems — recurrency conditions and algorithms. J Appl Probab 15:356–373
Federgruen A, Hordijk A, Tijms HC (1978) Recurrence conditions in denumerable state Markov decision processes. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 3–22
Federgruen A, Hordijk A, Tijms HC (1978) A note on simultaneous recurrence conditions on a set of stochastic matrices. J Appl Probab 15:842–847
Fox BL (1971) Finite state approximations to denumerable state dynamic programs. JMAA 34:665–670
Hinderer K (1978) On approximate solutions of finite stage dynamic programs. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 289–318
Hordijk A (1974) Dynamic programming and Markov potential theory. Math. Centre Tract 51, Amsterdam
Hordijk A, Schweitzer PJ, Tijms HC (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J Appl Probab 12:298–305
Nollau V, Hahnewald-Busch A (1978) An approximation procedure for stochastic dynamic programming in countable state space. Math Oper Forsch Stat Ser Opt 9:109–117
Nollau V, Hahnewald-Busch A (1979) An approximation procedure for stochastic dynamic programming based on clustering of state and action spaces. Math Oper Forsch Stat Ser Opt 10:121–131
Ross SM (1970) Applied Probability Models with optimization applications. Holden-Day, San Francisco
Schweitzer PJ (1971) Iterative solutions of the functional equations of undiscounted Markov renewal programming. JMAA 34:495–501
Stengos D (1980) Finite state algorithms for average cost countable state Markov decision processes. Ph. D. Thesis, University of Manchester
Stengos D, Thomas LC (1980) The blast furnaces problem. Eur J Oper Res 4:330–336
Thomas LC (1981) Connectedness conditions for denumerable state Markov decision processes. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 181–204
Thomas LC, Stengos D (1978) Finite state approximations for denumerable state Markov decision processes — the average cost case. Notes in decision theory, No. 60, University of Manchester
Thomas LC, Stengos D (1980) Finite state approximation algorithms for average cost denumerable state Markov decision processes — The policy space method. Notes in decision theory, No. 94, University of Manchester
Waldmann K (1978) On approximations of dynamic programs. Preprint No. 439, Technische Hochschule Darmstadt
White DJ (1963) Dynamic programming, Markov chains and the method of successive approximations. JMAA 6:373–376
White DJ (1980) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes. JMAA 74:292–295
White DJ (1981) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the method of successive approximations. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 57–72
White DJ (1979) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the policy space method. JMAA 72:513–523
Whitt W (1978) Approximations of dynamic programs I. Math Oper Res 3:237–243
Whitt W (1979) Approximations of dynamic programs II. Math Oper Res 4:179–185
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Thomas, L.C., Stengos, D. Finite state approximation algorithms for average cost denumerable state Markov decision processes. OR Spektrum 7, 27–37 (1985). https://doi.org/10.1007/BF01719758
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF01719758