Abstract
Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
Similar content being viewed by others
References
Abbeel, P.Y., Ng, A.Y., 2004. Apprenticeship Learning via Inverse Reinforcement Learning. 21st Int. Conf. on Machine Learning, p.1–8. [doi:10.1145/1015330.1015430]
Abbeel, P.Y., Coates, A., Quigley, M.Y., Ng, A., 2007. An Application of Reinforcement Learning to Aerobatic Helicopter Flight. Advances in Neural Information Processing Systems. MIT Press, Cambridge, McCallum, p.76–84.
Abbeel, P.Y., Dolgov, D., Ng, A.Y., Thrun, S., 2008. Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.1083–1090.
Chen, S.Y., Qian, H., Fan, J., Jin, Z.J., Zhu, M.L., 2010. Modified reward function on abstract features in inverse reinforcement learning. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(9):718–723. [doi: 10.1631/jzus.C0910486]
Kivinen, J., 2003. Online learning of linear classifiers. Adv. Lect. Mach. Learn., 26(1):235–258. [doi:10.1007/3-540-36434-X_8]
Kolter, J.Z., Abbeel, P.Y., Ng, A., 2008. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Advances in Neural Information Processing Systems. MIT Press, Cambridge, UK, p.769–776.
Lopes, M., Melo, F., Montesano, L., 2009. Active learning for reward estimation in inverse reinforcement learning. LNCS, 5782:31–46. [doi:10.1007/978-3-642-04174-7_3]
Neu, G., Szepesvari, C., 2007. Apprenticeship Learning Using Inverse Reinforcement Learning and Gradient Methods. 23rd Conf. on Uncertainty in Artificial Intelligence, p.295–302.
Ng, A., Russell, S., 2000. Algorithms for Inverse Reinforcement Learning. 17th Int. Conf. on Machine Learning, p.663–670.
Ramachandran, D., Amir, E., 2007. Bayesian Inverse Reinforcement Learning. 20th Int. Joint Conf. on Artifical Intelligence, p.2586–2591.
Ratliff, D.N., Bagnell, J.A., Zinkevich, M., 2006. Maximum Margin Planning. 23rd Int. Conf. on Machine Learning, p.729–736. [doi:10.1145/1143844.1143936]
Ratliff, D.N., Bagnell, J.A., Srinivasa, S.S., 2007. Imitation Learning for Locomotion and Manipulation. 7th IEEE-RAS Int. Conf. on Humanoid Robots, p.392–397. [doi:10.1109/ICHR.2007.4813899]
Russell, S., 1998. Learning Agents for Uncertain Environments. 11th Annual Conf. on Computational Learning Theory, p.101–103. [doi:10.1145/279943.279964]
Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, USA, p.51–86.
Syed, U., Schapire, R.E., 2008. A Game-Theoretic Approach to Apprenticeship Learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, p.1449–1456.
Syed, U., Bowling, M., Schapire, R.E., 2008. Apprenticeship Learning Using Linear Programming. 25th Int. Conf. on Machine Learning, p.1032–1039. [doi:10.1145/1390156.1390286]
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K., 2008. Maximum Entropy Inverse Reinforcement Learning. 23rd National Conf. on Artificial Intelligence, p. 1433–1438.
Author information
Authors and Affiliations
Corresponding author
Additional information
Project (No. 90820306) supported by the National Natural Science Foundation of China
Rights and permissions
About this article
Cite this article
Jin, Zj., Qian, H., Chen, Sy. et al. Convergence analysis of an incremental approach to online inverse reinforcement learning. J. Zhejiang Univ. - Sci. C 12, 17–24 (2011). https://doi.org/10.1631/jzus.C1010010
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/jzus.C1010010
Key words
- Incremental approach
- Reward recovering
- Online learning
- Inverse reinforcement learning
- Markov decision process