Abstract
Path integral (PI) control defines a general class of control problems for which the optimal control computation is equivalent to an inference problem that can be solved by evaluation of a path integral over state trajectories. However, this potential is mostly unused in real-world problems because of two main limitations: first, current approaches can typically only be applied to learn open-loop controllers and second, current sampling procedures are inefficient and not scalable to high dimensional systems. We introduce the efficient Path Integral Relative-Entropy Policy Search (PI-REPS) algorithm for learning feedback policies with PI control. Our algorithm is inspired by information theoretic policy updates that are often used in policy search. We use these updates to approximate the state trajectory distribution that is known to be optimal from the PI control theory. Our approach allows for a principled treatment of different sampling distributions and can be used to estimate many types of parametric or non-parametric feedback controllers. We show that PI-REPS significantly outperforms current methods and is able to solve tasks that are out of reach for current methods.
Chapter PDF
Similar content being viewed by others
References
Azar, M.G., Gómez, V., Kappen, H.J.: Dynamic policy programming. Journal of Machine Learning Research 13, 3207–3245 (2012)
Berret, B., Yung, I., Nori, F.: Open-loop stochastic optimal control of a passive noise-rejection variable stiffness actuator: Application to unstable tasks. In: Intelligent Robots and Systems, pp. 3029–3034. IEEE (2013)
Buchli, J., Stulp, F., Theodorou, E., Schaal, S.: Learning variable impedance control. The International Journal of Robotics Research 30(7), 820–833 (2011)
Daniel, C., Neumann, G., Peters, J.: Hierarchical Relative Entropy Policy Search. In: International Conference on Artificial Intelligence and Statistics (2012)
Ijspeert, A., Schaal, S.: Learning Attractor Landscapes for Learning Motor Primitives. In: Advances in Neural Information Processing Systems 15, MIT Press, Cambridge (2003)
Ijspeert, A.J., Nakanishi, J., Schaal, S.: Learning attractor landscapes for learning motor primitives. In: Advances in Neural Information Processing Systems, pp. 1523–1530 (2002)
Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Physical Review Letters 95(20), 200201 (2005)
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. Machine Learning 87, 159–182 (2012)
Kober, J., Peters, J.: Policy search for motor primitives in robotics. Mach. Learn. 84(1-2), 171–203 (2011)
Kupcsik, A., Deisenroth, M.P., Peters, J., Neumann, G.: Data-Efficient Contextual Policy Search for Robot Movement Skills. In: Proceedings of the National Conference on Artificial Intelligence (2013)
Peters, J., Mülling, K., Altün, Y.: Relative entropy policy search. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence, pp. 1607–1612 (2010)
Rawlik, K., Toussaint, M., Vijayakumar, S.: On stochastic optimal control and reinforcement learning by approximate inference. In: International Conference on Robotics Science and Systems (2012)
Rawlik, K., Toussaint, M., Vijayakumar, S.: Path integral control by reproducing kernel Hilbert space embedding. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1628–1634. AAAI Press (2013)
Rombokas, E., Theodorou, E., Malhotra, M., Todorov, E., Matsuoka, Y.: Tendon-driven control of biomechanical and robotic systems: A path integral reinforcement learning approach. In: International Conference on Robotics and Automation, pp. 208–214 (2012)
Stulp, F., Schaal, S.: Hierarchical reinforcement learning with movement primitives. In: 11th IEEE-RAS International Conference on Humanoid Robots, pp. 231–238 (2011)
Stulp, F., Sigaud, O.: Path Integral Policy Improvement with Covariance Matrix Adaptation. In: International Conference Machine Learning (2012)
Stulp, F., Theodorou, E., Buchli, J., Schaal, S.: Learning to grasp under uncertainty. In: International Conference on Robotics and Automation, pp. 5703–5708. IEEE (2011)
Theodorou, E., Buchli, J., Schaal, S.: A generalized path integral control approach to reinforcement learning. Journal of Machine Learning Research 11, 3137–3181 (2010)
Theodorou, E., Todorov, E.: Relative entropy and free energy dualities: connections to path integral and KL control. In: IEEE 51st Annual Conference on Decision and Control, pp. 1466–1473 (2012)
Todorov, E.: Linearly-solvable Markov decision problems. In: Advances in Neural Information Processing Systems 19, pp. 1369–1376. MIT Press, Cambridge (2006)
Todorov, E.: Policy gradients in linearly-solvable MDPs. In: Advances in Neural Information Processing Systems, pp. 2298–2306 (2010)
Toussaint, M.: Robot Trajectory Optimization using Approximate Inference. In: Proceedings of the 26th International Conference on Machine Learning (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gómez, V., Kappen, H.J., Peters, J., Neumann, G. (2014). Policy Search for Path Integral Control. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44848-9_31
Download citation
DOI: https://doi.org/10.1007/978-3-662-44848-9_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44847-2
Online ISBN: 978-3-662-44848-9
eBook Packages: Computer ScienceComputer Science (R0)