Policy Search for Path Integral Control

Gómez, Vicenç; Kappen, Hilbert J.; Peters, Jan; Neumann, Gerhard

doi:10.1007/978-3-662-44848-9_31

Vicenç Gómez^23,24,
Hilbert J. Kappen²⁴,
Jan Peters^25,26 &
…
Gerhard Neumann²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8724))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4685 Accesses
15 Citations

Abstract

Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn open-loop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional systems. We introduce the efficient Path Integral Relative-Entropy Policy Search (PI-REPS) algorithm for learning feedback policies with PI control. Our algorithm is inspired by information theoretic policy updates that are often used in policy search. We use these updates to approximate the state trajectory distribution that is known to be optimal from the PI control theory. Our approach allows for a principled treatment of different sampling distributions and can be used to estimate many types of parametric or non-parametric feedback controllers. We show that PI-REPS significantly outperforms current methods and is able to solve tasks that are out of reach for current methods.

Download to read the full chapter text

Chapter PDF

Adaptive Importance Sampling for Control and Inference

Article Open access 27 January 2016

H. J. Kappen & H. C. Ruiz

Decoupling Constraints from Sampling-Based Planners

A Survey on Constraining Policy Updates Using the KL Divergence

Keywords

References

Azar, M.G., Gómez, V., Kappen, H.J.: Dynamic policy programming. Journal of Machine Learning Research 13, 3207–3245 (2012)
MATH Google Scholar
Berret, B., Yung, I., Nori, F.: Open-loop stochastic optimal control of a passive noise-rejection variable stiffness actuator: Application to unstable tasks. In: Intelligent Robots and Systems, pp. 3029–3034. IEEE (2013)
Google Scholar
Buchli, J., Stulp, F., Theodorou, E., Schaal, S.: Learning variable impedance control. The International Journal of Robotics Research 30(7), 820–833 (2011)
Article Google Scholar
Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (2012)
Google Scholar
Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15, MIT Press, Cambridge (2003)
Google Scholar
Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2002)
Google Scholar
Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Physical Review Letters 95(20), 200201 (2005)
Article MathSciNet Google Scholar
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Machine Learning 87, 159–182 (2012)
Article MATH MathSciNet Google Scholar
Kober, J., Peters, J.: Policy search for motor primitives in robotics. Mach. Learn. 84(1-2), 171–203 (2011)
Article MATH MathSciNet Google Scholar
Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient Contextual Policy Search for Robot Movement Skills. In: Proceedings of the National Conference on Artificial Intelligence (2013)
Google Scholar
Peters, J., Mülling, K., Altün, Y.: Relative entropy policy search. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 1607–1612 (2010)
Google Scholar
Rawlik, K., Toussaint, M., Vijayakumar, S.: On stochastic optimal control and reinforcement learning by approximate inference. In: International Conference on Robotics Science and Systems (2012)
Google Scholar
Rawlik, K., Toussaint, M., Vijayakumar, S.: Path integral control by reproducing kernel Hilbert space embedding. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1628–1634. AAAI Press (2013)
Google Scholar
Rombokas, E., Theodorou, E., Malhotra, M., Todorov, E., Matsuoka, Y.: Tendon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach. In: International Conference on Robotics and Automation, pp. 208–214 (2012)
Google Scholar
Stulp, F., Schaal, S.: Hierarchical reinforcement learning with movement primitives. In: 11th IEEE-RAS International Conference on Humanoid Robots, pp. 231–238 (2011)
Google Scholar
Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference Machine Learning (2012)
Google Scholar
Stulp, F., Theodorou, E., Buchli, J., Schaal, S.: Learning to grasp under uncertainty. In: International Conference on Robotics and Automation, pp. 5703–5708. IEEE (2011)
Google Scholar
Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research 11, 3137–3181 (2010)
MATH MathSciNet Google Scholar
Theodorou, E., Todorov, E.: Relative entropy and free energy dualities: connections to path integral and KL control. In: IEEE 51st Annual Conference on Decision and Control, pp. 1466–1473 (2012)
Google Scholar
Todorov, E.: Linearly-solvable Markov decision problems. In: Advances in Neural Information Processing Systems 19, pp. 1369–1376. MIT Press, Cambridge (2006)
Google Scholar
Todorov, E.: Policy gradients in linearly-solvable MDPs. In: Advances in Neural Information Processing Systems, pp. 2298–2306 (2010)
Google Scholar
Toussaint, M.: Robot Trajectory Optimization using Approximate Inference. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Communication Technologies, Universitat Pompeu Fabra, Barcelona, E-08018, Barcelona, Spain
Vicenç Gómez
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, 6525 AJ, Nijmegen, The Netherlands
Vicenç Gómez & Hilbert J. Kappen
Intelligent Autonomous Systems, Technische Universität Darmstadt, Hochschulstr. 10, 64289, Darmstadt, Germany
Jan Peters & Gerhard Neumann
Max Planck Institute for Intelligent Systems, Spemannstr. 38, 72076, Tübingen, Germany
Jan Peters

Authors

Vicenç Gómez
View author publications
You can also search for this author in PubMed Google Scholar
Hilbert J. Kappen
View author publications
You can also search for this author in PubMed Google Scholar
Jan Peters
View author publications
You can also search for this author in PubMed Google Scholar
Gerhard Neumann
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gómez, V., Kappen, H.J., Peters, J., Neumann, G. (2014). Policy Search for Path Integral Control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-662-44848-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Policy Search for Path Integral Control

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Importance Sampling for Control and Inference

Decoupling Constraints from Sampling-Based Planners

A Survey on Constraining Policy Updates Using the KL Divergence

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Policy Search for Path Integral Control

Abstract

Chapter PDF

Similar content being viewed by others

Adaptive Importance Sampling for Control and Inference

Decoupling Constraints from Sampling-Based Planners

A Survey on Constraining Policy Updates Using the KL Divergence

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation