Finite state approximation algorithms for average cost denumerable state Markov decision processes

Thomas, L. C.; Stengos, D.

doi:10.1007/BF01719758

Finite state approximation algorithms for average cost denumerable state Markov decision processes

Theoretical Papers
Published: 01 March 1985

Volume 7, pages 27–37, (1985)
Cite this article

Operations-Research-Spektrum Aims and scope Submit manuscript

L. C. Thomas¹ &
D. Stengos²

73 Accesses
9 Citations
Explore all metrics

Summary

In this paper we study three finite state, value and policy iteration algorithms for denumerable space Markov decision processes with respect to the average cost criterion. The convergence of these algorithms is guaranteed under a scrambling-type recurrency condition and various “tail” conditions on the transition probabilities. With the value iteration schemes we construct nearly optimal policies by concentrating on a finite set of “important” states and controlling them as well as we can. The policy space algorithm consists of a value determination scheme associated with a policy and a policy improvement step where a “better” policy is determined. Thus a sequence of improved policies is constructed which is shown to converge to the optimal average cost policy.

Zusammenfassung

Für Markovsche Entscheidungsprozesse mit abzählbarem Zustandsraum untersuchen wir für den Fall des Durchschnittskostenkriteriums drei endliche Wertiterations- und Politikiterations-Algorithmen. Die Konvergenz der Algorithmen wird durch “scramblingtype” Rekurrenzbedingungen und verschiene “tail” Bedingungen an die Übergangswahrscheinlichkeiten gesichert. Mit den Wertiterationsverfahren konstruieren wir fast optimale Politiken, indem wir uns auf eine endliche Menge von “wichtigen” Zuständen konzentrieren und diese bestmöglich kontrollieren. Der Politikiterations-Algorithmus besteht aus einem Schritt zur Wertbestimmung für eine Politik und einem Schritt zur Verbesserung der Politik. Auf diese Weise wird eine Folge verbesserter Politiken konstruiert, die Konvergenz zur optimalen Politik wird gezeigt.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177
Article Google Scholar
Federgruen A, Schweitzer PJ, Tijms HC (1978) Contraction mappings underlying undiscounted Markov decision problems. JMAA 65:711–730
Google Scholar
Federgruen A, Tijms HC (1978) The optimality equation in average cost denumerable state semi-Markov decision problems — recurrency conditions and algorithms. J Appl Probab 15:356–373
Article Google Scholar
Federgruen A, Hordijk A, Tijms HC (1978) Recurrence conditions in denumerable state Markov decision processes. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 3–22
Chapter Google Scholar
Federgruen A, Hordijk A, Tijms HC (1978) A note on simultaneous recurrence conditions on a set of stochastic matrices. J Appl Probab 15:842–847
Article Google Scholar
Fox BL (1971) Finite state approximations to denumerable state dynamic programs. JMAA 34:665–670
Google Scholar
Hinderer K (1978) On approximate solutions of finite stage dynamic programs. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 289–318
Chapter Google Scholar
Hordijk A (1974) Dynamic programming and Markov potential theory. Math. Centre Tract 51, Amsterdam
Google Scholar
Hordijk A, Schweitzer PJ, Tijms HC (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J Appl Probab 12:298–305
Article Google Scholar
Nollau V, Hahnewald-Busch A (1978) An approximation procedure for stochastic dynamic programming in countable state space. Math Oper Forsch Stat Ser Opt 9:109–117
Google Scholar
Nollau V, Hahnewald-Busch A (1979) An approximation procedure for stochastic dynamic programming based on clustering of state and action spaces. Math Oper Forsch Stat Ser Opt 10:121–131
Google Scholar
Ross SM (1970) Applied Probability Models with optimization applications. Holden-Day, San Francisco
Google Scholar
Schweitzer PJ (1971) Iterative solutions of the functional equations of undiscounted Markov renewal programming. JMAA 34:495–501
Google Scholar
Stengos D (1980) Finite state algorithms for average cost countable state Markov decision processes. Ph. D. Thesis, University of Manchester
Stengos D, Thomas LC (1980) The blast furnaces problem. Eur J Oper Res 4:330–336
Article Google Scholar
Thomas LC (1981) Connectedness conditions for denumerable state Markov decision processes. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 181–204
Google Scholar
Thomas LC, Stengos D (1978) Finite state approximations for denumerable state Markov decision processes — the average cost case. Notes in decision theory, No. 60, University of Manchester
Thomas LC, Stengos D (1980) Finite state approximation algorithms for average cost denumerable state Markov decision processes — The policy space method. Notes in decision theory, No. 94, University of Manchester
Waldmann K (1978) On approximations of dynamic programs. Preprint No. 439, Technische Hochschule Darmstadt
White DJ (1963) Dynamic programming, Markov chains and the method of successive approximations. JMAA 6:373–376
Google Scholar
White DJ (1980) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes. JMAA 74:292–295
Google Scholar
White DJ (1981) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the method of successive approximations. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 57–72
Google Scholar
White DJ (1979) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the policy space method. JMAA 72:513–523
Google Scholar
Whitt W (1978) Approximations of dynamic programs I. Math Oper Res 3:237–243
Article Google Scholar
Whitt W (1979) Approximations of dynamic programs II. Math Oper Res 4:179–185
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Decision Theory, University of Manchester, UK
L. C. Thomas
Department of Statistics, University of Athens, Greece
D. Stengos

Authors

L. C. Thomas
View author publications
You can also search for this author in PubMed Google Scholar
D. Stengos
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, L.C., Stengos, D. Finite state approximation algorithms for average cost denumerable state Markov decision processes. OR Spektrum 7, 27–37 (1985). https://doi.org/10.1007/BF01719758

Download citation

Received: 12 April 1981
Accepted: 02 July 1984
Published: 01 March 1985
Issue Date: March 1985
DOI: https://doi.org/10.1007/BF01719758

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finite state approximation algorithms for average cost denumerable state Markov decision processes

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Controller synthesis for linear temporal logic and steady-state specifications

Online Multiset Submodular Cover

Stochastic Lipschitz dynamic programming

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Finite state approximation algorithms for average cost denumerable state Markov decision processes

Summary

Zusammenfassung

Access this article

Similar content being viewed by others

Controller synthesis for linear temporal logic and steady-state specifications

Online Multiset Submodular Cover

Stochastic Lipschitz dynamic programming

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation