Abstract
We consider a class of discrete-time Markov control processes with Borel state and action spaces, and ℝd i.i.d. disturbances with unknown distribution μ. Under mild semi-continuity and compactness conditions, and assuming that μ is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.
Similar content being viewed by others
References
Acosta Abreu, R.S.: Controlled Markov chains with unknown parameters and metric state space, Bol. Soc. Mat. Mexicana, in press (in Spanish).
Acosta Abreu, R. S. and Hernández-Lerma, O.: Iterative adaptive control of denumerable state average-cost Markov Systems, Control Cyber. 14 (1985), 313–322.
Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975.
Ash, R. B.: Real Analysis and Probability, Academic Press, New York, 1972.
Bertsekas, D. P.: Dynamic Programming and Stochastic Control, Academic Press, New York, 1976.
Bertsekas, D. P. and Shreve, S. E.: Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978.
Cavazos-Cadena, R.: Finite-state approximations for denumerable state discounted Markov decision processes, Appl. Math. Optim. 14 (1986), 1–26.
Cavazos-Cadena, R.: Finite-state approximations and adaptive control of discounted Markov decision processes with unbounded rewards, Control Cyber. 16 (1987), 31–58.
Cavazos-Cadena, R.: Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. 65 (1990), 191–207.
Devroye, L: A Course in Density Estimation, Birkhäuser, Boston, 1987.
Devroye, L. and Györfi, L.: Nonparametric Density Estimation: The L 1 View, Wiley, New York, 1985.
Dynkin, E. B. and Yushkevich, A. A.: Controlled Markov Processes, Springer-Verlag, New York, 1979.
Doukhan, P. and Ghindès, M.: Etude du processus 306–1, C.R. Acad. Sci. Paris, Sér. A 290 (1980), 921–923.
Flynn, J.: Conditions for the equivalence of optimality criteria in dynamic programming, Ann. Statist. 4 (1976), 936–953.
Georgin, J. P.: Estimation et contrôle des chînes de Markov sur des espaces arbitraires, Lecture Notes Math. 636 (1978), 71–113.
Georgin, J. P.: Contrôle de chaînes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré, Sect. B 14 (1978), 255–277.
Gihman, I. I. and Skorohod, A. V.: Controlled Stochastic Processes, Springer-Verlag, New York, 1979.
Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518.
Hernández-Lerma, O.: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435–445.
Hernández-Lerma, O.: Approximation and adaptive control of Markov processes: Average reward criterion, Kybernetika (Prague) 23 (1987), 265–288.
Hernández-Lerma, O.: Adaptive Markov Control Processes, Springer-Verlag, New York, 1989.
Hernández-Lerma, O. and Cavazos-Cadena, R.: Continuous dependence of stochastic control models on the noise distribution, Appl. Math. Optim. 17 (1988), 79–89.
Hernández-Lerma, O. and Marcus, S. I.: Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl. 46 (1985), 227–235.
Hernández-Lerma, O. and Marcus, S. I.: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Syst. Control Lett. 9 (1987), 307–315.
Hernández-Lerma, O., Esparza, S. O., and Duran, B. S.: Recursive nonparametric estimation of nonstationary Markov processes, Bol. Soc. Mat. Mexicana 33 (1988).
Himmelberg, C. J., Parthasarathy, T., and Van, Vleck, F. S.: Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390–394.
Hinderer, K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes Oper. Res. 33. Springer-Verlag, New York, 1970.
Iosifescu, M.: On two recent papers on ergodicity in nonhomogeneous Markov chains, Ann. Math. Statist. 43 (1972), 1732–1736.
Mandl, P.: Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40–60.
Prakasa Rao, B. L. S. Nonparametric Functional Estimation, Academic Press, New York, 1983.
Ross, S. M.: Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970.
Rudin, W.: Functional Analysis, McGraw-Hill, New York, 1973.
Schäl, M.: Estimation and control in discounted stochastic dynamic programming, Stochastics 20 (1987), 51–71.
Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Geb. 32 (1975), 179–196.
Ueno, T.: Some limit theorems for temporally discrete Markov processes, J. Fac. Sci. Univ. Tokyo 7 (1957), 449–462.
Author information
Authors and Affiliations
Additional information
Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152.
Rights and permissions
About this article
Cite this article
Hernández-Lerma, O., Cavazos-Cadena, R. Density estimation and adaptive control of markov processes: Average and discounted criteria. Acta Appl Math 20, 285–307 (1990). https://doi.org/10.1007/BF00049572
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00049572