Abstract
The problem of state abstraction is of central importance in optimal control, reinforcement learning and Markov decision processes. This paper studies the case of variable resolution state abstraction for continuous time and space, deterministic dynamic control problems in which near-optimal policies are required. We begin by defining a class of variable resolution policy and value function representations based on Kuhn triangulations embedded in a kd-trie. We then consider top-down approaches to choosing which cells to split in order to generate improved policies. The core of this paper is the introduction and evaluation of a wide variety of possible splitting criteria. We begin with local approaches based on value function and policy properties that use only features of individual cells in making split choices. Later, by introducing two new non-local measures, influence and variance, we derive splitting criteria that allow one cell to efficiently take into account its impact on other cells when deciding whether to split. Influence is an efficiently-calculable measure of the extent to which changes in some state effect the value function of some other states. Variance is an efficiently-calculable measure of how risky is some state in a Markov chain: a low variance state is one in which we would be very surprised if, during any one execution, the long-term reward attained from that state differed substantially from its expected value, given by the value function.
The paper proceeds by graphically demonstrating the various approaches to splitting on the familiar, non-linear, non-minimum phase, and two dimensional problem of the “Car on the hill”. It then evaluates the performance of a variety of splitting criteria on many benchmark problems, paying careful attention to their number-of-cells versus closeness-to-optimality tradeoff curves.
Article PDF
Similar content being viewed by others
References
Baird, L. C. (1995). ResiduaL algorithms: Reinforcement learning with function approximation. In Machine Learning: Proceedings of the Twelfth International Conference.
Baird, L. C. (1998). Gradient descent for general reinforcement learning. Neural information processing systems, 11.
Barles, G., & Souganidis, P. (1991). Convergence of approximation schemes for fully nonlinear second order equations. Asymptotic Analysis, 4, 271–283.
Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that that can learn difficult control problems. IEEE Trans. in Systems Man and Cybernetics, 13:5, 835–846.
Bertsekas, D. P. (1987). Dynamic programming: Deterministic and stochastic models. Englewood Cliffs, NJ: Prentice Hall.
Bertsekas, D. P., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
Boone, G. (1997). Minimum-time control of the acrobot. In International Conference on Robotics and Automation.
Boyan, J., & Moore, A. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in neural information processing systems, 7.
Chapman, D., & Kaelbling, L. P. (1991). Learning from delayed reinforcement in a complex domain. In IJCAI-91.
Crandall, M., Ishii, H., & Lions, P. (1992). User's guide to viscosity solutions of second order partial differential equations. Bulletin of the American Mathematical Society, 27:1.
Crandall, M., & Lions, P. (1983). Viscosity solutions of hamilton-jacobi equations. Trans. of the American Mathematical Society, 277.
Crites, B., & Barto, A. (1996). Improving elevator performance using reinforcement learning. Advances in neural information processing systems, 8.
Davies, S. (1997). Multidimentional triangulation and interpolation for reinforcement learning. In Neural information processing systems, 9. San Francisco, CA: Morgan Kaufmann.
Dupuis, P., & James, M. R. (1998). Rates of convergence for approximation schemes in optimal control. SIAM Journal Control and Optimization, 360:2.
Fleming, W. H., & Soner, H. M. (1993). Controlled Markov processes and viscosity solutions. Applications of Mathematics. Berlin: Springer-Verlag.
Friedman, J. H., Bentley, J. L., & Finkel, R. A. (1977). An algorithm for finding best matches in logarithmic expected time. ACM Transactions on Mathematical Software, 3:3, 209–226.
Gordon, G. (1995). Stable function approximation in dynamic programming. In Proceedings of the International Conference on Machine Learning.
Gordon, G. J. (1999). Approximate solutions to Markov decision processes. Ph.D. Thesis, CS department, Carnegie Mellon University, Pittsburgh, PA.
Griebel, M. (1998). Adaptive sparse grid multilevel methods for elliptic pdes based on finite differences. Notes on Numerical Fluid Mechanics, Proceedings Large Scale Scientific Computations.
Grüne, L. (1997). An adaptive grid scheme for the discrete hamilton-jacobi-bellman equation. Numerische Mathematik, 75-3.
Kaelbling, L. P. (1993). Learning in embedded systems. Cambridge, MA: MIT Press.
Knuth, D. E. (1973). Sorting and searching. Reading, MA: Addison Wesley.
Kushner, H. J., & Dupuis, P. (1992). Numerical methods for stochastic control problems in continuous time. Applications of Mathematics. Berlin: Springer-Verlag.
McCallum, A. (1995). Instance-based utile distinctions for reinforcement learning with hidden state. In Machine Learning (Proceedings of the Twelfth International Conference) San Francisco, CA: Morgan Kaufmann.
Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and backpropagation of uncertainty. Machine Learning, 35:2.
Moody, J., & Darken, C. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 289-303.
Moore, A.W. (1991). Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued state-spaces. In L. Birnbaum & G. Collins (Eds.), Machine Learning: Proceedings of the Eighth International Workshop. San Francisco, CA: Morgan Kaufmann.
Moore, A. W., & Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state space. Machine Learning, 21.
Moore, D. W. (1992). Simplical mesh generation with applications. Ph.D. Thesis, Cornell University.
Munos, R. (2000). A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning, 40, 265–299.
Munos, R., Baird, L., & Moore, A. (1999). Gradient descent approaches to neural-net-based solutions of the hamilton-jacobi-bellman equation. In International Joint Conference on Neural Networks.
Munos, R., & Moore, A. (1998). Barycentric interpolators for continuous space and time reinforcement learning. Advances in neural information processing systems, 11.
Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. International Conference on Machine Learning.
Niederreiter,H. (1992). Random number generation and quasi-monte carlo methods. SIAM CBMS-NSF Conference Series in Applied Mathematics, Philadelphia, 63.
Puterman, M. L. (1994). Markov decision processes, discrete stochastic dynamic programming. A Wiley-Interscience Publication.
Rust, J. (1996). Numerical dynamic programming in economics. In Handbook of Computational Economics. North Holland: Elsevier.
Simons, J., Van Brussel, H., De Schutter, J., & Verhaert, J. (1982). A self-learning automaton with variable resolution for high precision assembly by industrial robots. IEEE Trans. on Automatic Control, 27: 5, 1109–1113.
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in neural information processing systems, 8.
Tesauro, G. (1995). Temporal difference learning and td-gammon. Communication of the ACM, 38, 58–68.
Trick, M. A., & Zin, S. E. (1993). A linear programming approach to solving stochastic dynamic programs. Unpublished manuscript.
Tsitsiklis, J., & Van Roy, B. (1996). An analysis of temporal difference learning with function approximation. Technical Report LIDS-P-2322, MIT.
Zenger, C. (1990). Sparse grids. In Parallel Algorithms for Partial Differential Equations, Proceedings of the Sixth GAMM-Seminar.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Munos, R., Moore, A. Variable Resolution Discretization in Optimal Control. Machine Learning 49, 291–323 (2002). https://doi.org/10.1023/A:1017992615625
Issue Date:
DOI: https://doi.org/10.1023/A:1017992615625