Modular inverse reinforcement learning for visuomotor behavior

Rothkopf, Constantin A.; Ballard, Dana H.

doi:10.1007/s00422-013-0562-6

Modular inverse reinforcement learning for visuomotor behavior

Original Paper
Published: 06 July 2013

Volume 107, pages 477–490, (2013)
Cite this article

Biological Cybernetics Aims and scope Submit manuscript

Constantin A. Rothkopf^1,2,3 &
Dana H. Ballard⁴

959 Accesses
25 Citations
4 Altmetric
Explore all metrics

Abstract

In a large variety of situations one would like to have an expressive and accurate model of observed animal or human behavior. While general purpose mathematical models may capture successfully properties of observed behavior, it is desirable to root models in biological facts. Because of ample empirical evidence for reward-based learning in visuomotor tasks, we use a computational model based on the assumption that the observed agent is balancing the costs and benefits of its behavior to meet its goals. This leads to using the framework of reinforcement learning, which additionally provides well-established algorithms for learning of visuomotor task solutions. To quantify the agent’s goals as rewards implicit in the observed behavior, we propose to use inverse reinforcement learning, which quantifies the agent’s goals as rewards implicit in the observed behavior. Based on the assumption of a modular cognitive architecture, we introduce a modular inverse reinforcement learning algorithm that estimates the relative reward contributions of the component tasks in navigation, consisting of following a path while avoiding obstacles and approaching targets. It is shown how to recover the component reward weights for individual tasks and that variability in observed trajectories can be explained succinctly through behavioral goals. It is demonstrated through simulations that good estimates can be obtained already with modest amounts of observation data, which in turn allows the prediction of behavior in novel configurations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Path Planning and Trajectory Planning Algorithms: A General Overview

Notes

The basis of the model developed in this paper was published previously as part of a PhD thesis (Rothkopf 2008).

References

Barrett HC, Kurzban R (2006) Modularity in cognition: framing the debate. Psychol Rev 113(3):628
Article PubMed Google Scholar
Barto AC (1995) Adaptive critics and the basal ganglia. In: Houk JC, Davis JL, Beiser DG (eds) Models of information processing in the basal ganglia. MIT Press, Cambridge, MA, pp 215–232
Google Scholar
Billard A, Mataric MJ (2001) Learning human arm movements by imitation: evaluation of a biologically inspired connectionist architecture. Robotics Auton Syst 37:145–160
Article Google Scholar
Bromberg-Martin ES, Matsumoto M, Hikosaka O (2010) Dopamine in motivational control: Rewarding, aversive, and alerting. Neuron 68:815–834
Article PubMed CAS Google Scholar
Brooks R (1986) A robust layered control system for a mobile robot. IEEE J Robotics Autom 2(1):14–23
Google Scholar
Chang Y-H, Ho T, Kaelbling LP (2004) All learning is local: multi-agent learning in global reward games. In: Thrun S, Saul L, Schölkopf B (eds) Advances in neural information processing systems 16. MIT Press, Cambridge, MA
Google Scholar
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095): 876–879. ISSN 1476–4687. doi:10.1038/nature04766. URL http://www.ncbi.nlm.nih.gov/pubmed/16778890
Google Scholar
Daw ND, Doya K (2006) The computational neurobiology of learning and reward. Curr Opin Neurobiol 16(2):199–204
Article PubMed CAS Google Scholar
Dayan P, Hinton GE (1992) Feudal reinforcement learning. In: Advances in neural information processing systems 5. Morgan Kaufmann Publishers, Burlington, pp 271–271
Dimitrakakis C, Rothkopf CA (2011) Bayesian multitask inverse reinforcement learning. In: European workshop on reinforcemnt learning (EWRL)
Fajen BR, Warren WH (2003) Behavioral dynamics of steering, obstable avoidance, and route selection. J Exp Psychol Hum Percept Perform 29(2):343
Article PubMed Google Scholar
Fodor JA (1983) Modularity of mind. MIT Press, Cambridge, MA
Google Scholar
Gershman SJ, Pesaran B, Daw ND (2009) Human reinforcement learning subdivides structured action spaces by learning effector-specific values. J Neurosci 29(43):13524–13531
Article PubMed CAS Google Scholar
Glimcher PW (2004) Decisions, uncertainty, and the brain: the science of neuroeconomics. MIT Press, Bradford Books, Cambridge, MA
Google Scholar
Gold JI, Shadlen MN (2007) The neural basis of decision making. Annu Rev Neurosci 30(1):535–574. ISSN 0147–006X. doi:10.1146/annurev.neuro.29.051605.113038
Google Scholar
Graybiel AM, Aosaki T, Flaherty AW, Kimura M (1994) The basal ganglia and adaptive motor control. Science 265(5180):1826–1831
Article PubMed CAS Google Scholar
Haber SN (2003) The primate basal ganglia: parallel and integrative networks. J Chem Neuroanat 26(4):317–330
Article PubMed Google Scholar
Humphrys M (1996) Action selection methods using reinforcement learning. In: Maes P, Mataric M, Meyer J-A, Pollack J, Wilson SW (eds) From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior. MIT Press, Bradford Books, Cambridge, MA, pp 135–144
Google Scholar
Kaelbling LP (1993) Hierarchical learning in stochastic domains: preliminary results. In: Proceedings of the tenth international conference on machine learning, vol 951, pp 167–173
Lee YJ, Mangasarian OL (2001) Ssvm: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22
Article Google Scholar
Lopes M, Melo F, Montesano L (2009) Active learning for reward estimation in inverse reinforcement learning. In: Buntine W, Grobelnik M, Mladenić D, Shawe-Taylor J (eds) Machine learning and knowledge discovery in databases. Lecture notes in computer science, vol 5782. Springer, Berlin, Heidelberg, pp 31–46. http://dx.doi.org/10.1007/978-3-642-04174-7_3
Minsky M (1988) The society of mind. Simon and Schuster
Montague PR, Dayan P, Sejnowski TJ (1996) framework for mesencephalic dopamine systems based on predictive hebbian learning. J Neurosci 16:1936–1947
PubMed CAS Google Scholar
Neu G, Szepesvári C (2007) Apprenticeship learning using inverse reinforcement learning and gradient methods. In: Proceedings of the 23 conference on uncertainty in, artificial intelligence, pp 295–302
Ng AY, Russell S (2000) Algorithms for inverse reinforcement learning. In: Proceedings 17th international conference on machine learning, Morgan Kaufmann, pp 663–670
Pastor P, Hoffmann H, Asfour T, Schaal S (2009) Learning and generalization of motor skills by learning from demonstration. In: International conference on robotics and automation
Pinker SA (1999) How the mind works. Ann N Y Acad Sci 882(1):119–127
Google Scholar
Puterman ML (1994) Markov decision processes. Wiley, New York, NY
Book Google Scholar
Ramachandran D, Amir E (2007) Bayesian inverse reinforcement learning. In: 20th internatinal joint conference artificial intelligence
Rothkopf CA (2008) Modular models of task based visually guided behavior. PhD thesis, Department of Brain and Cognitive Sciences, Department of Computer Science, University of Rochester
Rothkopf CA, Ballard DH (2010) Credit assignment in multiple goal embodied visuomotor behavior. Frontiers in Psychology, 1, Special Issue on Embodied, Cognition (00173)
Rothkopf CA, Dimitrakakis C (2001) Preference elicitation and inverse reinforcement learning. In: 22nd European conference on machine learning (ECML)
Rummery GA, Niranjan M (1994) On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department
Russell S, Zimdars AL (2003) Q-decomposition for reinforcement learning agents. In: Proceedings of the international conference on machine learning, vol 20, p 656
Samejima K, Ueda Y, Doya K, Kimura M (2005) Representation of action-specific reward values in the striatum. Science 310(5752):1337
Article PubMed CAS Google Scholar
Schmidt M, Fung G, Rosales R (2007) Fast optimization methods for l1 regularization: a comparative study and two new approaches. In: Kok J, Koronacki J, Mantaras R, Matwin S, Mladenic D, Skowron A (eds) Machine learning: ECML 2007, volume 4701 of Lecture notes in computer science, Springer, Berlin, 2007, pp 286–297. ISBN 978-3-540-74957-8
Schöner G, Dose M (1992) A dynamical systems approach to task-level system integration used to plan and control autonomous vehicle motion. Robotics Auton Syst 10(4):253–267
Article Google Scholar
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275:1593–1599
Article PubMed CAS Google Scholar
Seymour B, O’Doherty JP, Dayan P, Koltzenburg M, Jones AK, Dolan RJ, Friston KJ, Frackowiak RS (2004) Temporal difference models describe higher-order learning in humans. Nature 429(6992):664–667
Article PubMed CAS Google Scholar
Singh S, Cohn D (1998) How to dynamically merge Markov decision processes. In: Neural information processing systems 10, pp 1057–1063
Sprague N, Ballard D (2003) Multiple-goal reinforcement learning with modular sarsa(0). In: International joint conference on artificial intelligence, Acapulco, August 2003
Sprague N, Ballard DH (2007) Modeling embodied visual behaviors. ACM Trans Appl Percept 4(2):11
Article Google Scholar
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3:9–44
Google Scholar
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, MA
Google Scholar
Von Neumann J, Morgenstern O, Rubinstein A, Kuhn HW (1947) Theory of games and economic behavior. Princeton University Press, Princeton, NJ
Google Scholar
Whitehead SD (1991) A complexity analysis of cooperative mechanisms in reinforcement learning. In: Proceedings of the association for artificial intelligence
Whitehead SD, Ballard DH (1991) Learning to perceive and act by trial and error. Mach Learn 7:45–83
Google Scholar
Ziebart BD, Bagnell JA, Dey AK (2010) Modeling interaction via the principle of maximum causal entropy. In: Johannes F, Thorsten J (eds) Proceedings of the 27th international conference on machine learning (ICML-10), June 21–24, 2010. Haifa, Israel, pp 1255–1262

Download references

Acknowledgments

The research reported herein was supported by NIH Grants RR009283 and NSF grant 0932277. CR was additionally supported by the BMBF Project Bernstein Fokus: Neurotechnologie Frankfurt, FKZ 01GQ0840 and EU-Project IM-CLeVeR, FP7-ICT-IP-231722.

Author information

Authors and Affiliations

Frankfurt Institute for Advanced Studies, Goethe University, 60438 , Frankfurt, Germany
Constantin A. Rothkopf
Institute of Cognitive Science, University Osnabrück, 49076 , Osnabrück, Germany
Constantin A. Rothkopf
Technical University Darmstadt, 64283 , Darmstadt, Germany
Constantin A. Rothkopf
Department for Computer Science, University of Texas at Austin, Austin, TX, 78712, USA
Dana H. Ballard

Authors

Constantin A. Rothkopf
View author publications
You can also search for this author in PubMed Google Scholar
Dana H. Ballard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Constantin A. Rothkopf.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rothkopf, C.A., Ballard, D.H. Modular inverse reinforcement learning for visuomotor behavior. Biol Cybern 107, 477–490 (2013). https://doi.org/10.1007/s00422-013-0562-6

Download citation

Received: 24 August 2012
Accepted: 17 June 2013
Published: 06 July 2013
Issue Date: August 2013
DOI: https://doi.org/10.1007/s00422-013-0562-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Modular inverse reinforcement learning for visuomotor behavior

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Modular inverse reinforcement learning for visuomotor behavior

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

A practical guide to multi-objective reinforcement learning and planning

Path Planning and Trajectory Planning Algorithms: A General Overview

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation