Skip to main content
Log in

Finite state approximation algorithms for average cost denumerable state Markov decision processes

  • Theoretical Papers
  • Published:
Operations-Research-Spektrum Aims and scope Submit manuscript

Summary

In this paper we study three finite state, value and policy iteration algorithms for denumerable space Markov decision processes with respect to the average cost criterion. The convergence of these algorithms is guaranteed under a scrambling-type recurrency condition and various “tail” conditions on the transition probabilities. With the value iteration schemes we construct nearly optimal policies by concentrating on a finite set of “important” states and controlling them as well as we can. The policy space algorithm consists of a value determination scheme associated with a policy and a policy improvement step where a “better” policy is determined. Thus a sequence of improved policies is constructed which is shown to converge to the optimal average cost policy.

Zusammenfassung

Für Markovsche Entscheidungsprozesse mit abzählbarem Zustandsraum untersuchen wir für den Fall des Durchschnittskostenkriteriums drei endliche Wertiterations- und Politikiterations-Algorithmen. Die Konvergenz der Algorithmen wird durch “scramblingtype” Rekurrenzbedingungen und verschiene “tail” Bedingungen an die Übergangswahrscheinlichkeiten gesichert. Mit den Wertiterationsverfahren konstruieren wir fast optimale Politiken, indem wir uns auf eine endliche Menge von “wichtigen” Zuständen konzentrieren und diese bestmöglich kontrollieren. Der Politikiterations-Algorithmus besteht aus einem Schritt zur Wertbestimmung für eine Politik und einem Schritt zur Verbesserung der Politik. Auf diese Weise wird eine Folge verbesserter Politiken konstruiert, die Konvergenz zur optimalen Politik wird gezeigt.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Denardo EV (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev 9:165–177

    Article  Google Scholar 

  2. Federgruen A, Schweitzer PJ, Tijms HC (1978) Contraction mappings underlying undiscounted Markov decision problems. JMAA 65:711–730

    Google Scholar 

  3. Federgruen A, Tijms HC (1978) The optimality equation in average cost denumerable state semi-Markov decision problems — recurrency conditions and algorithms. J Appl Probab 15:356–373

    Article  Google Scholar 

  4. Federgruen A, Hordijk A, Tijms HC (1978) Recurrence conditions in denumerable state Markov decision processes. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 3–22

    Chapter  Google Scholar 

  5. Federgruen A, Hordijk A, Tijms HC (1978) A note on simultaneous recurrence conditions on a set of stochastic matrices. J Appl Probab 15:842–847

    Article  Google Scholar 

  6. Fox BL (1971) Finite state approximations to denumerable state dynamic programs. JMAA 34:665–670

    Google Scholar 

  7. Hinderer K (1978) On approximate solutions of finite stage dynamic programs. In: Puterman ML (ed) Dynamic programming and its applications. Academic Press, New York, pp 289–318

    Chapter  Google Scholar 

  8. Hordijk A (1974) Dynamic programming and Markov potential theory. Math. Centre Tract 51, Amsterdam

    Google Scholar 

  9. Hordijk A, Schweitzer PJ, Tijms HC (1975) The asymptotic behaviour of the minimal total expected cost for the denumerable state Markov decision model. J Appl Probab 12:298–305

    Article  Google Scholar 

  10. Nollau V, Hahnewald-Busch A (1978) An approximation procedure for stochastic dynamic programming in countable state space. Math Oper Forsch Stat Ser Opt 9:109–117

    Google Scholar 

  11. Nollau V, Hahnewald-Busch A (1979) An approximation procedure for stochastic dynamic programming based on clustering of state and action spaces. Math Oper Forsch Stat Ser Opt 10:121–131

    Google Scholar 

  12. Ross SM (1970) Applied Probability Models with optimization applications. Holden-Day, San Francisco

    Google Scholar 

  13. Schweitzer PJ (1971) Iterative solutions of the functional equations of undiscounted Markov renewal programming. JMAA 34:495–501

    Google Scholar 

  14. Stengos D (1980) Finite state algorithms for average cost countable state Markov decision processes. Ph. D. Thesis, University of Manchester

  15. Stengos D, Thomas LC (1980) The blast furnaces problem. Eur J Oper Res 4:330–336

    Article  Google Scholar 

  16. Thomas LC (1981) Connectedness conditions for denumerable state Markov decision processes. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 181–204

    Google Scholar 

  17. Thomas LC, Stengos D (1978) Finite state approximations for denumerable state Markov decision processes — the average cost case. Notes in decision theory, No. 60, University of Manchester

  18. Thomas LC, Stengos D (1980) Finite state approximation algorithms for average cost denumerable state Markov decision processes — The policy space method. Notes in decision theory, No. 94, University of Manchester

  19. Waldmann K (1978) On approximations of dynamic programs. Preprint No. 439, Technische Hochschule Darmstadt

  20. White DJ (1963) Dynamic programming, Markov chains and the method of successive approximations. JMAA 6:373–376

    Google Scholar 

  21. White DJ (1980) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes. JMAA 74:292–295

    Google Scholar 

  22. White DJ (1981) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the method of successive approximations. In: Hartley, Thomas LC, White DJ (eds) Recent developments in Markov decision processes. Academic Press, New York, pp 57–72

    Google Scholar 

  23. White DJ (1979) Finite state approximations for denumerable state infinite horizon discounted Markov decision processes: the policy space method. JMAA 72:513–523

    Google Scholar 

  24. Whitt W (1978) Approximations of dynamic programs I. Math Oper Res 3:237–243

    Article  Google Scholar 

  25. Whitt W (1979) Approximations of dynamic programs II. Math Oper Res 4:179–185

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Thomas, L.C., Stengos, D. Finite state approximation algorithms for average cost denumerable state Markov decision processes. OR Spektrum 7, 27–37 (1985). https://doi.org/10.1007/BF01719758

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01719758

Keywords

Navigation