Density estimation and adaptive control of markov processes: Average and discounted criteria

Hernández-Lerma, Onésimo; Cavazos-Cadena, Rolando

doi:10.1007/BF00049572

Density estimation and adaptive control of markov processes: Average and discounted criteria

Published: September 1990

Volume 20, pages 285–307, (1990)
Cite this article

Acta Applicandae Mathematica Aims and scope Submit manuscript

Onésimo Hernández-Lerma¹ &
Rolando Cavazos-Cadena²

55 Accesses
11 Citations
Explore all metrics

Abstract

We consider a class of discrete-time Markov control processes with Borel state and action spaces, and ℝ^d i.i.d. disturbances with unknown distribution μ. Under mild semi-continuity and compactness conditions, and assuming that μ is absolutely continuous with respect to Lebesgue measure, we establish the existence of adaptive control policies which are (1) optimal for the average-reward criterion, and (2) asymptotically optimal in the discounted case. Our results are obtained by taking advantage of some well-known facts in the theory of density estimation. This approach allows us to avoid restrictive conditions on the state space and/or on the system's transition law imposed in recent works, and on the other hand, it clearly shows the way to other applications of nonparametric (density) estimation to adaptive control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Markov control models with unknown random state–action-dependent discount factors

Article 13 February 2015

Local Poisson Equations Associated with Discrete-Time Markov Control Processes

Article 13 February 2017

Finite horizon continuous-time Markov decision processes with mean and variance criteria

Article 29 September 2018

References

Acosta Abreu, R.S.: Controlled Markov chains with unknown parameters and metric state space, Bol. Soc. Mat. Mexicana, in press (in Spanish).
Acosta Abreu, R. S. and Hernández-Lerma, O.: Iterative adaptive control of denumerable state average-cost Markov Systems, Control Cyber. 14 (1985), 313–322.
Google Scholar
Adams, R. A.: Sobolev Spaces, Academic Press, New York, 1975.
Google Scholar
Ash, R. B.: Real Analysis and Probability, Academic Press, New York, 1972.
Google Scholar
Bertsekas, D. P.: Dynamic Programming and Stochastic Control, Academic Press, New York, 1976.
Google Scholar
Bertsekas, D. P. and Shreve, S. E.: Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978.
Google Scholar
Cavazos-Cadena, R.: Finite-state approximations for denumerable state discounted Markov decision processes, Appl. Math. Optim. 14 (1986), 1–26.
Google Scholar
Cavazos-Cadena, R.: Finite-state approximations and adaptive control of discounted Markov decision processes with unbounded rewards, Control Cyber. 16 (1987), 31–58.
Google Scholar
Cavazos-Cadena, R.: Nonparametric adaptive control of discounted stochastic systems with compact state space, J. Optim. Theory Appl. 65 (1990), 191–207.
Google Scholar
Devroye, L: A Course in Density Estimation, Birkhäuser, Boston, 1987.
Google Scholar
Devroye, L. and Györfi, L.: Nonparametric Density Estimation: The L ₁ View, Wiley, New York, 1985.
Google Scholar
Dynkin, E. B. and Yushkevich, A. A.: Controlled Markov Processes, Springer-Verlag, New York, 1979.
Google Scholar
Doukhan, P. and Ghindès, M.: Etude du processus 306–1, C.R. Acad. Sci. Paris, Sér. A 290 (1980), 921–923.
Google Scholar
Flynn, J.: Conditions for the equivalence of optimality criteria in dynamic programming, Ann. Statist. 4 (1976), 936–953.
Google Scholar
Georgin, J. P.: Estimation et contrôle des chînes de Markov sur des espaces arbitraires, Lecture Notes Math. 636 (1978), 71–113.
Google Scholar
Georgin, J. P.: Contrôle de chaînes de Markov sur des espaces arbitraires, Ann. Inst. H. Poincaré, Sect. B 14 (1978), 255–277.
Google Scholar
Gihman, I. I. and Skorohod, A. V.: Controlled Stochastic Processes, Springer-Verlag, New York, 1979.
Google Scholar
Gordienko, E. I.: Adaptive strategies for certain classes of controlled Markov processes, Theory Probab. Appl. 29 (1985), 504–518.
Google Scholar
Hernández-Lerma, O.: Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl. 112 (1985), 435–445.
Google Scholar
Hernández-Lerma, O.: Approximation and adaptive control of Markov processes: Average reward criterion, Kybernetika (Prague) 23 (1987), 265–288.
Google Scholar
Hernández-Lerma, O.: Adaptive Markov Control Processes, Springer-Verlag, New York, 1989.
Google Scholar
Hernández-Lerma, O. and Cavazos-Cadena, R.: Continuous dependence of stochastic control models on the noise distribution, Appl. Math. Optim. 17 (1988), 79–89.
Google Scholar
Hernández-Lerma, O. and Marcus, S. I.: Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl. 46 (1985), 227–235.
Google Scholar
Hernández-Lerma, O. and Marcus, S. I.: Adaptive policies for discrete-time stochastic control systems with unknown disturbance distribution, Syst. Control Lett. 9 (1987), 307–315.
Google Scholar
Hernández-Lerma, O., Esparza, S. O., and Duran, B. S.: Recursive nonparametric estimation of nonstationary Markov processes, Bol. Soc. Mat. Mexicana 33 (1988).
Himmelberg, C. J., Parthasarathy, T., and Van, Vleck, F. S.: Optimal plans for dynamic programming problems, Math. Oper. Res. 1 (1976), 390–394.
Google Scholar
Hinderer, K.: Foundations of Non-Stationary Dynamic Programming with Discrete Time Parameter, Lecture Notes Oper. Res. 33. Springer-Verlag, New York, 1970.
Google Scholar
Iosifescu, M.: On two recent papers on ergodicity in nonhomogeneous Markov chains, Ann. Math. Statist. 43 (1972), 1732–1736.
Google Scholar
Mandl, P.: Estimation and control in Markov chains, Adv. Appl. Probab. 6 (1974), 40–60.
Google Scholar
Prakasa Rao, B. L. S. Nonparametric Functional Estimation, Academic Press, New York, 1983.
Google Scholar
Ross, S. M.: Applied Probability Models with Optimization Applications, Holden-Day, San Francisco, 1970.
Google Scholar
Rudin, W.: Functional Analysis, McGraw-Hill, New York, 1973.
Google Scholar
Schäl, M.: Estimation and control in discounted stochastic dynamic programming, Stochastics 20 (1987), 51–71.
Google Scholar
Schäl, M.: Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies to be optimal, Z. Wahrsch. verw. Geb. 32 (1975), 179–196.
Google Scholar
Ueno, T.: Some limit theorems for temporally discrete Markov processes, J. Fac. Sci. Univ. Tokyo 7 (1957), 449–462.
Google Scholar

Download references

Author information

Authors and Affiliations

Departamento de Matemáticas, CINVESTAV-IPN, Apartado Postal 14-740, 07000, México, DF, Mexico
Onésimo Hernández-Lerma
Departmento de Estadística y Cálculo, Universidad Autónoma Agraria Antonio Narro, Buenavista 25315, Saltillo, Coahuila, Mexico
Rolando Cavazos-Cadena

Authors

Onésimo Hernández-Lerma
View author publications
You can also search for this author in PubMed Google Scholar
Rolando Cavazos-Cadena
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Research partially supported by The Third World Academy of Sciences under Research Grant No. MP 898-152.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hernández-Lerma, O., Cavazos-Cadena, R. Density estimation and adaptive control of markov processes: Average and discounted criteria. Acta Appl Math 20, 285–307 (1990). https://doi.org/10.1007/BF00049572

Download citation

Received: 10 May 1989
Accepted: 12 July 1990
Issue Date: September 1990
DOI: https://doi.org/10.1007/BF00049572

AMS subject classifications (1980)

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density estimation and adaptive control of markov processes: Average and discounted criteria

Abstract

Access this article

Similar content being viewed by others

Markov control models with unknown random state–action-dependent discount factors

Local Poisson Equations Associated with Discrete-Time Markov Control Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

AMS subject classifications (1980)

Key words

Navigation

Density estimation and adaptive control of markov processes: Average and discounted criteria

Abstract

Access this article

Similar content being viewed by others

Markov control models with unknown random state–action-dependent discount factors

Local Poisson Equations Associated with Discrete-Time Markov Control Processes

Finite horizon continuous-time Markov decision processes with mean and variance criteria

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

AMS subject classifications (1980)

Key words

Search

Navigation