Variable Resolution Discretization in Optimal Control

Munos, Rémi; Moore, Andrew

doi:10.1023/A:1017992615625

Variable Resolution Discretization in Optimal Control

Published: November 2002

Volume 49, pages 291–323, (2002)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Variable Resolution Discretization in Optimal Control

Download PDF

Rémi Munos¹ &
Andrew Moore²

2179 Accesses
167 Citations
4 Altmetric
1 Mention
Explore all metrics

Abstract

The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.

The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.

References

Baird, L. C. (1995). ResiduaL algorithms: Reinforcement learning with function approximation. In Machine Learning: Proceedings of the Twelfth International Conference.
Baird, L. C. (1998). Gradient descent for general reinforcement learning. Neural information processing systems, 11.
Barles, G., & Souganidis, P. (1991). Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis, 4, 271–283.
Google Scholar
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that that can learn difficult control problems. IEEE Trans. in Systems Man and Cybernetics, 13:5, 835–846.
Google Scholar
Bertsekas, D. P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice Hall.
Google Scholar
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
Boone, G. (1997). Minimum-time control of the acrobot. In International Conference on Robotics and Automation.
Boyan, J., & Moore, A. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 7.
Chapman, D., & Kaelbling, L. P. (1991). Learning from delayed reinforcement in a complex domain. In IJCAI-91.
Crandall, M., Ishii, H., & Lions, P. (1992). User's guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27:1.
Crandall, M., & Lions, P. (1983). Viscosity solutions of hamilton-jacobi equations. Trans. of the American Mathematical Society, 277.
Crites, B., & Barto, A. (1996). Improving elevator performance using reinforcement learning. Advances in neural information processing systems, 8.
Davies, S. (1997). Multidimentional triangulation and interpolation for reinforcement learning. In Neural information processing systems, 9. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Dupuis, P., & James, M. R. (1998). Rates of convergence for approximation schemes in optimal control. SIAM Journal Control and Optimization, 360:2.
Fleming, W. H., & Soner, H. M. (1993). Controlled Markov processes and viscosity solutions. Applications of Mathematics. Berlin: Springer-Verlag.
Google Scholar
Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3:3, 209–226.
Google Scholar
Gordon, G. (1995). Stable function approximation in dynamic programming. In Proceedings of the International Conference on Machine Learning.
Gordon, G. J. (1999). Approximate solutions to Markov decision processes. Ph.D. Thesis, CS department, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Griebel, M. (1998). Adaptive sparse grid multilevel methods for elliptic pdes based on finite differences. Notes on Numerical Fluid Mechanics, Proceedings Large Scale Scientific Computations.
Grüne, L. (1997). An adaptive grid scheme for the discrete hamilton-jacobi-bellman equation. Numerische Mathematik, 75-3.
Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge, MA: MIT Press.
Google Scholar
Knuth, D. E. (1973). Sorting and searching. Reading, MA: Addison Wesley.
Google Scholar
Kushner, H. J., & Dupuis, P. (1992). Numerical methods for stochastic control problems in continuous time. Applications of Mathematics. Berlin: Springer-Verlag.
Google Scholar
McCallum, A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Machine Learning (Proceedings of the Twelfth International Conference) San Francisco, CA: Morgan Kaufmann.
Google Scholar
Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and backpropagation of uncertainty. Machine Learning, 35:2.
Moody, J., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 289-303.
Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In L. Birnbaum & G. Collins (Eds.), Machine Learning: Proceedings of the Eighth International Workshop. San Francisco, CA: Morgan Kaufmann.
Google Scholar
Moore, A. W., & Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space. Machine Learning, 21.
Moore, D. W. (1992). Simplical mesh generation with applications. Ph.D. Thesis, Cornell University.
Munos, R. (2000). A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40, 265–299.
Google Scholar
Munos, R., Baird, L., & Moore, A. (1999). Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In International Joint Conference on Neural Networks.
Munos, R., & Moore, A. (1998). Barycentric interpolators for continuous space and time reinforcement learning. Advances in neural information processing systems, 11.
Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. International Conference on Machine Learning.
Niederreiter,H. (1992). Random number generation and quasi-monte carlo methods. SIAM CBMS-NSF Conference Series in Applied Mathematics, Philadelphia, 63.
Puterman, M. L. (1994). Markov decision processes, discrete stochastic dynamic programming. A Wiley-Interscience Publication.
Rust, J. (1996). Numerical dynamic programming in economics. In Handbook of Computational Economics. North Holland: Elsevier.
Google Scholar
Simons, J., Van Brussel, H., De Schutter, J., & Verhaert, J. (1982). A self-learning automaton with variable resolution for high precision assembly by industrial robots. IEEE Trans. on Automatic Control, 27: 5, 1109–1113.
Google Scholar
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 8.
Tesauro, G. (1995). Temporal difference learning and td-gammon. Communication of the ACM, 38, 58–68.
Google Scholar
Trick, M. A., & Zin, S. E. (1993). A linear programming approach to solving stochastic dynamic programs. Unpublished manuscript.
Tsitsiklis, J., & Van Roy, B. (1996). An analysis of temporal difference learning with function approximation. Technical Report LIDS-P-2322, MIT.
Zenger, C. (1990). Sparse grids. In Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar.

Download references

Author information

Authors and Affiliations

Centre de Mathématiques Appliquées, Ecole Polytechnique, 91128, Palaiseau, France
Rémi Munos
Robotics Institute, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, 15213, USA
Andrew Moore

Authors

Rémi Munos
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Moore
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Munos, R., Moore, A. Variable Resolution Discretization in Optimal Control. Machine Learning 49, 291–323 (2002). https://doi.org/10.1023/A:1017992615625

Download citation

Issue Date: November 2002
DOI: https://doi.org/10.1023/A:1017992615625

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Variable Resolution Discretization in Optimal Control

Abstract

Article PDF

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Continuous-Time Markov Decisions Based on Partial Exploration

Approximating Euclidean by Imprecise Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Variable Resolution Discretization in Optimal Control

Abstract

Article PDF

Similar content being viewed by others

Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations

Continuous-Time Markov Decisions Based on Partial Exploration

Approximating Euclidean by Imprecise Markov Decision Processes

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation