Skip to main content
Log in

Density estimation and adaptive control of markov processes: Average and discounted criteria

  • Published:
Acta Applicandae Mathematica Aims and scope Submit manuscript

Abstract

We consider a class of discrete-time Markov control processes with Borel state and action spaces, and ℝd i.i.d. disturbances with unknown distribution μ. Under mild semi-continuity and compactness conditions, and assuming that μ is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acosta Abreu, R.S.: Controlled Markov chains with unknown parameters and metric state space, Bol. Soc. Mat. Mexicana, in press (in Spanish).

  2. Acosta Abreu, R. S. and Hernández-Lerma, O.: Iterative adaptive control of denumerable state average-cost Markov Systems, Control Cyber. 14 (1985), 313–322.

    Google Scholar 

  3. Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975.

    Google Scholar 

  4. Ash, R. B.: Real Analysis and Probability, Academic Press, New York, 1972.

    Google Scholar 

  5. Bertsekas, D. P.: Dynamic Programming and Stochastic Control, Academic Press, New York, 1976.

    Google Scholar 

  6. Bertsekas, D. P. and Shreve, S. E.: Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978.

    Google Scholar 

  7. Cavazos-Cadena, R.: Finite-state approximations for denumerable state discounted Markov decision processes, Appl. Math. Optim. 14 (1986), 1–26.

    Google Scholar 

  8. Cavazos-Cadena, R.: Finite-state approximations and adaptive control of discounted Markov decision processes with unbounded rewards, Control Cyber. 16 (1987), 31–58.

    Google Scholar 

  9. Cavazos-Cadena, R.: Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. 65 (1990), 191–207.

    Google Scholar 

  10. Devroye, L: A Course in Density Estimation, Birkhäuser, Boston, 1987.

    Google Scholar 

  11. Devroye, L. and Györfi, L.: Nonparametric Density Estimation: The L 1 View, Wiley, New York, 1985.

    Google Scholar 

  12. Dynkin, E. B. and Yushkevich, A. A.: Controlled Markov Processes, Springer-Verlag, New York, 1979.

    Google Scholar 

  13. Doukhan, P. and Ghindès, M.: Etude du processus 306–1, C.R. Acad. Sci. Paris, Sér. A 290 (1980), 921–923.

    Google Scholar 

  14. Flynn, J.: Conditions for the equivalence of optimality criteria in dynamic programming, Ann. Statist. 4 (1976), 936–953.

    Google Scholar 

  15. Georgin, J. P.: Estimation et contrôle des chînes de Markov sur des espaces arbitraires, Lecture Notes Math. 636 (1978), 71–113.

    Google Scholar 

  16. Georgin, J. P.: Contrôle de chaînes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré, Sect. B 14 (1978), 255–277.

    Google Scholar 

  17. Gihman, I. I. and Skorohod, A. V.: Controlled Stochastic Processes, Springer-Verlag, New York, 1979.

    Google Scholar 

  18. Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518.

    Google Scholar 

  19. Hernández-Lerma, O.: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435–445.

    Google Scholar 

  20. Hernández-Lerma, O.: Approximation and adaptive control of Markov processes: Average reward criterion, Kybernetika (Prague) 23 (1987), 265–288.

    Google Scholar 

  21. Hernández-Lerma, O.: Adaptive Markov Control Processes, Springer-Verlag, New York, 1989.

    Google Scholar 

  22. Hernández-Lerma, O. and Cavazos-Cadena, R.: Continuous dependence of stochastic control models on the noise distribution, Appl. Math. Optim. 17 (1988), 79–89.

    Google Scholar 

  23. Hernández-Lerma, O. and Marcus, S. I.: Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl. 46 (1985), 227–235.

    Google Scholar 

  24. Hernández-Lerma, O. and Marcus, S. I.: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Syst. Control Lett. 9 (1987), 307–315.

    Google Scholar 

  25. Hernández-Lerma, O., Esparza, S. O., and Duran, B. S.: Recursive nonparametric estimation of nonstationary Markov processes, Bol. Soc. Mat. Mexicana 33 (1988).

  26. Himmelberg, C. J., Parthasarathy, T., and Van, Vleck, F. S.: Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390–394.

    Google Scholar 

  27. Hinderer, K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes Oper. Res. 33. Springer-Verlag, New York, 1970.

    Google Scholar 

  28. Iosifescu, M.: On two recent papers on ergodicity in nonhomogeneous Markov chains, Ann. Math. Statist. 43 (1972), 1732–1736.

    Google Scholar 

  29. Mandl, P.: Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40–60.

    Google Scholar 

  30. Prakasa Rao, B. L. S. Nonparametric Functional Estimation, Academic Press, New York, 1983.

    Google Scholar 

  31. Ross, S. M.: Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970.

    Google Scholar 

  32. Rudin, W.: Functional Analysis, McGraw-Hill, New York, 1973.

    Google Scholar 

  33. Schäl, M.: Estimation and control in discounted stochastic dynamic programming, Stochastics 20 (1987), 51–71.

    Google Scholar 

  34. Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Geb. 32 (1975), 179–196.

    Google Scholar 

  35. Ueno, T.: Some limit theorems for temporally discrete Markov processes, J. Fac. Sci. Univ. Tokyo 7 (1957), 449–462.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández-Lerma, O., Cavazos-Cadena, R. Density estimation and adaptive control of markov processes: Average and discounted criteria. Acta Appl Math 20, 285–307 (1990). https://doi.org/10.1007/BF00049572

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00049572

AMS subject classifications (1980)

Key words

Navigation