Skip to main content
Log in

Estimation and control in multichain processes

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

This paper considers Markovian decision processes in discrete time with transition probabilities depending on an unknown parameter which may change step by step. In the case of the convergence of such a parameter sequence, a policy maximizing the average expected reward over an infinite future is looked for. Under continuity conditions, the uniform optimality of a policy based on “estimation and control” for some multichain models is shown.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. Bather, Optimal decision procedures for finite Markov chains. Part II: Communicating systems, Adv. Appl. Prob. 5(1973)521–540.

    Google Scholar 

  2. R. Bellman,Dynamic Programming (Princeton University Press, Princeton, 1957).

    Google Scholar 

  3. K.-J. Bierth, Nearly optimal policies in semi-Markov decision models,14th Symp. on Operations Research, Ulm (1989).

  4. D. Blackwell, Discrete dynamic programming, Ann. Math. Stat. 33(1962)719–726.

    Google Scholar 

  5. D. Blackwell, Discounted dynamic programming, Ann. Math. Stat. 36(1965)226–235.

    Google Scholar 

  6. R. Dekker, Denumerable Markov decision chains: Optimal policies for small interest rates, Doctoral Dissertation, University of Leiden (1985).

  7. A. Dvoretzky, J. Kiefer and J. Wolfowitz, The inventory problem: II. Case of unknown distribution of demand, Econometrica 20(1952)450–466.

    Google Scholar 

  8. E. Dynkin and A. Yushkevich,Controlled Markov Processes (Springer, 1979).

  9. Y. Fukuda, Bayes and maximum likelihood policies for a multi-echelon inventory problem, Planning Research Corporation, Los Angeles (1969).

    Google Scholar 

  10. J. Georgin, Estimation et contrôle des chaines de Markov sur des espaces arbitraires, in: Lecture Notes in Mathematics 636 (Springer, 1978), pp. 71–113.

  11. H.-J. Girlich and A. Sokolichin, Schätzen und Steuern in einem Markoffschen Entscheidungsmodell mit unbekannter Parameterfolge, Wiss. Zeitschr. Techn. Hochschule Leipzig 12(1988)121–126.

    Google Scholar 

  12. K. Hinderer,Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes in Operations Research 33 (Springer, 1970).

  13. A. Hordijk,Dynamic Programming and Markov Potential Theory, Mathematical Centre Tracts 51, Amsterdam (1974).

  14. G. Hübner, A unified approach to adaptive controls of average reward Markov decision processes, OR Spektrum 10(1988)161–166.

    Google Scholar 

  15. M. Kolonko, Dynamische Optimierung unter Unsicherheit in einem Semi-Markoff-Modell mit abzählbarem Zustandsraum, Doctoral Dissertation, University of Bonn (1980).

  16. M. Kolonko and M. Schäl, Optimal control of semi-Markov chains under uncertainty with applications to queueing models,Proc. in Operations Research 9 (Physica-Verlag, Würzburg 1980), pp. 430–435.

    Google Scholar 

  17. M. Kurano, Discrete-time Markovian decision processes with an unknown parameter — average return criterion, J. Oper. Res. Soc. Japan 15(1972)67–76.

    Google Scholar 

  18. P. Mandl, Estimation and control in Markov chains, Adv. Appl. Prob. 6(1974)40–60.

    Google Scholar 

  19. P. Mandl, On the adaptive control of countable Markov chains, Banach Center Publ. 5(1979)159–173.

    Google Scholar 

  20. P. Mandl and G. Hübner, Transient phenomena and self-optimizing control of Markov chains, Acta Univ. Carolinae, Math. Phys., 26(1985)35–51.

    Google Scholar 

  21. E. Mann, Optimality equations and sensitive optimality in bounded Markov decision processes, Optimization 16(1985)767–781.

    Google Scholar 

  22. E. Mann, Optimalitätsgleichungen für undiskontierte semi-Markoffsche Entscheidungsprozesse, Doctoral Dissertation, University of Bonn (1986).

  23. U. Rieder, Bayesian dynamic programming, Adv. Appl. Prob. 7(1975)330–348.

    Google Scholar 

  24. M. Schäl, Estimation and control in Markov decisions models Wiss. Zeitschr. Techn. Hochschule Leipzig 12(1988)187–192.

    Google Scholar 

  25. M. Schäl, On the second optimality equation for semi-markov decision models, Math. Oper. Res. (1989), submitted.

  26. A. Sokolichin, Existenz durchschnittsoptimaler Strategien in einem Markoffschen Entscheidungsmodell mit unbekannter Parameterfolge, Optimization 19(1988)577–585.

    Google Scholar 

  27. K. van Hee, Bayesian control of Markov chains, Doctoral Dissertation, Techn. Hogeschool Eindhoven (1978).

  28. H. Zijm, The optimality equations in multichain denumerable state Markov decision processes with the average cost criterion: The bounded cost case, Statist. Decisions 3(1985)314–365.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Girlich, HJ., Sokolichin, A.A. Estimation and control in multichain processes. Ann Oper Res 32, 23–33 (1991). https://doi.org/10.1007/BF02204826

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02204826

Keywords

Navigation