Skip to main content
Log in

Convergence analysis of an incremental approach to online inverse reinforcement learning

  • Published:
Journal of Zhejiang University SCIENCE C Aims and scope Submit manuscript

Abstract

Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the IRL problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Abbeel, P.Y., Ng, A.Y., 2004. Apprenticeship Learning via Inverse Reinforcement Learning. 21st Int. Conf. on Machine Learning, p.1–8. [doi:10.1145/1015330.1015430]

  • Abbeel, P.Y., Coates, A., Quigley, M.Y., Ng, A., 2007. An Application of Reinforcement Learning to Aerobatic Helicopter Flight. Advances in Neural Information Processing Systems. MIT Press, Cambridge, McCallum, p.76–84.

    Google Scholar 

  • Abbeel, P.Y., Dolgov, D., Ng, A.Y., Thrun, S., 2008. Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, p.1083–1090.

  • Chen, S.Y., Qian, H., Fan, J., Jin, Z.J., Zhu, M.L., 2010. Modified reward function on abstract features in inverse reinforcement learning. J. Zhejiang Univ.-Sci. C (Comput. & Electron.), 11(9):718–723. [doi: 10.1631/jzus.C0910486]

    Article  Google Scholar 

  • Kivinen, J., 2003. Online learning of linear classifiers. Adv. Lect. Mach. Learn., 26(1):235–258. [doi:10.1007/3-540-36434-X_8]

    Article  Google Scholar 

  • Kolter, J.Z., Abbeel, P.Y., Ng, A., 2008. Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion. Advances in Neural Information Processing Systems. MIT Press, Cambridge, UK, p.769–776.

    Google Scholar 

  • Lopes, M., Melo, F., Montesano, L., 2009. Active learning for reward estimation in inverse reinforcement learning. LNCS, 5782:31–46. [doi:10.1007/978-3-642-04174-7_3]

    Google Scholar 

  • Neu, G., Szepesvari, C., 2007. Apprenticeship Learning Using Inverse Reinforcement Learning and Gradient Methods. 23rd Conf. on Uncertainty in Artificial Intelligence, p.295–302.

  • Ng, A., Russell, S., 2000. Algorithms for Inverse Reinforcement Learning. 17th Int. Conf. on Machine Learning, p.663–670.

  • Ramachandran, D., Amir, E., 2007. Bayesian Inverse Reinforcement Learning. 20th Int. Joint Conf. on Artifical Intelligence, p.2586–2591.

  • Ratliff, D.N., Bagnell, J.A., Zinkevich, M., 2006. Maximum Margin Planning. 23rd Int. Conf. on Machine Learning, p.729–736. [doi:10.1145/1143844.1143936]

  • Ratliff, D.N., Bagnell, J.A., Srinivasa, S.S., 2007. Imitation Learning for Locomotion and Manipulation. 7th IEEE-RAS Int. Conf. on Humanoid Robots, p.392–397. [doi:10.1109/ICHR.2007.4813899]

  • Russell, S., 1998. Learning Agents for Uncertain Environments. 11th Annual Conf. on Computational Learning Theory, p.101–103. [doi:10.1145/279943.279964]

  • Sutton, R.S., Barto, A.G., 1998. Reinforcement Learning: an Introduction. MIT Press, USA, p.51–86.

    Google Scholar 

  • Syed, U., Schapire, R.E., 2008. A Game-Theoretic Approach to Apprenticeship Learning. Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, p.1449–1456.

    Google Scholar 

  • Syed, U., Bowling, M., Schapire, R.E., 2008. Apprenticeship Learning Using Linear Programming. 25th Int. Conf. on Machine Learning, p.1032–1039. [doi:10.1145/1390156.1390286]

  • Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K., 2008. Maximum Entropy Inverse Reinforcement Learning. 23rd National Conf. on Artificial Intelligence, p. 1433–1438.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Qian.

Additional information

Project (No. 90820306) supported by the National Natural Science Foundation of China

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jin, Zj., Qian, H., Chen, Sy. et al. Convergence analysis of an incremental approach to online inverse reinforcement learning. J. Zhejiang Univ. - Sci. C 12, 17–24 (2011). https://doi.org/10.1631/jzus.C1010010

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/jzus.C1010010

Key words

CLC number

Navigation