Skip to main content

Advertisement

Log in

Reducing Computational Cost During Robot Navigation and Human–Robot Interaction with a Human-Inspired Reinforcement Learning Architecture

  • Published:
International Journal of Social Robotics Aims and scope Submit manuscript

Abstract

We present a new neuro-inspired reinforcement learning architecture for robot online learning and decision-making during both social and non-social scenarios. The goal is to take inspiration from the way humans dynamically and autonomously adapt their behavior according to variations in their own performance while minimizing cognitive effort. Following computational neuroscience principles, the architecture combines model-based (MB) and model-free (MF) reinforcement learning (RL). The main novelty here consists in arbitrating with a meta-controller which selects the current learning strategy according to a trade-off between efficiency and computational cost. The MB strategy, which builds a model of the long-term effects of actions and uses this model to decide through dynamic programming, enables flexible adaptation to task changes at the expense of high computation costs. The MF strategy is less flexible but also 1000 times less costly, and learns by observation of MB decisions. We test the architecture in three experiments: a navigation task in a real environment with task changes (wall configuration changes, goal location changes); a simulated object manipulation task under human teaching signals; and a simulated human–robot cooperation task to tidy up objects on a table. We show that our human-inspired strategy coordination method enables the robot to maintain an optimal performance in terms of reward and computational cost compared to an MB expert alone, which achieves the best performance but has the highest computational cost. We also show that the method makes it possible to cope with sudden changes in the environment, goal changes or changes in the behavior of the human partner during interaction tasks. The robots that performed these experiments, whether real or virtual, all used the same set of parameters, thus showing the generality of the method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data and code availability

The open source code related to this work can be accessed from Figshare: https://doi.org/10.6084/m9.figshare.21511968.v1. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Alami R, Chatila R, Fleury S, Ghallab M, Ingrand F (1998) An architecture for autonomy. IJRR J 17:315–337

    Google Scholar 

  2. Alami R, Warnier M, Guitton J, Lemaignan S, Sisbot EA (2011) When the robot considers the human. In: Proceedings of the 15th international symposium on robotics research

  3. Banquet J-P, Hanoune S, Gaussier P, Quoy M (2016) From cognitive to habit behavior during navigation, through cortical-basal ganglia loops. In: International conference on artificial neural networks. Springer, pp 238–247

  4. Caluwaerts K, Favre-Félix A, Staffa M, N’Guyen S, Grand C, Girard B, Khamassi M (2012) Neuro-inspired navigation strategies shifting for robots: integration of a multiple landmark taxon strategy. In: Prescott TJ et al (eds) Living machines 2012. LNAI, vol 7375/2012, pp 62–73

  5. Caluwaerts K, Staffa M, N’Guyen S, Grand C, Dollé L, Favre-Félix A, Girard B, Khamassi M (2012) A biologically inspired meta-control navigation system for the psikharpax rat robot. Bioinspiration Biomim 7:025009

    Article  Google Scholar 

  6. Cazé R, Khamassi M, Aubin L, Girard B (2018) Hippocampal replays under the scrutiny of reinforcement learning models. J Neurophysiol 120(6):2877–2896

    Article  Google Scholar 

  7. Chatila R, Renaudo E, Andries M, Chavez-Garcia RO, Luce-Vayrac P, Gottstein R, Alami R, Clodic A, Devin S, Girard B, Khamassi M (2018) Toward self-aware robots. Front Robot AI 5(1):88–108

    Article  Google Scholar 

  8. Chebotar Y, Hausman K, Zhang M, Sukhatme G, Schaal S, Levine S (2017) Combining model-based and model-free updates for trajectory-centric reinforcement learning. In: International conference on machine learning. PMLR, pp 703–711

  9. Daw ND, Gershman SJ, Seymour B, Dayan P, Dolan RJ (2011) Model-based influences on humans’ choices and striatal prediction errors. Neuron 69(6):1204–1215

    Article  Google Scholar 

  10. Daw ND, Niv Y, Dayan P (2005) Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nat Neurosci 8(12):1704–1711

    Article  Google Scholar 

  11. Dollé L, Sheynikhovich D, Girard B, Chavarriaga R, Guillot A (2010) Path planning versus cue responding: a bioinspired model of switching between navigation strategies. Biol Cybern 103(4):299–317

    Article  MATH  Google Scholar 

  12. Dollé L, Khamassi M, Girard B, Guillot A, Chavarriaga R (2008) Analyzing interactions between navigation strategies using a computational model of action selection. In: International conference on spatial cognition, pp 71–86

  13. Doncieux S, Filliat D, Díaz-Rodríguez N, Hospedales T, Duro R, Coninx A, Roijers DM, Girard B, Perrin N, Sigaud O (2018) Open-ended learning: a conceptual framework based on representational redescription. Front Neurorobot 12:59

    Article  Google Scholar 

  14. Doncieux S, Bredeche N, Le Goff L, Girard B, Coninx A, Sigaud O, Khamassi M, Díaz-Rodríguez N, Filliat D, Hospedales T et al (2020) Dream architecture: a developmental approach to open-ended learning in robotics. arXiv preprint arXiv:2005.06223

  15. Dromnelle R, Girard B, Renaudo E, Chatila R, Khamassi M (2020) Coping with the variability in humans reward during simulated human–robot interactions through the coordination of multiple learning strategies. In: 2020 29th IEEE international conference on robot and human interactive communication (RO-MAN). IEEE, pp 612–617

  16. Dromnelle R, Renaudo E, Pourcel G, Chatila R, Girard B, Khamassi M (2020) How to reduce computation time while sparing performance during robot navigation? a neuro-inspired architecture for autonomous shifting between model-based and model-free learning. In: Conference on biomimetic and biohybrid systems. Springer, pp 68–79

  17. Dunn OJ (1964) Multiple comparisons using rank sums technometrics 6:241–252. Find this article online

  18. Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1–2):7–52

    Article  MATH  Google Scholar 

  19. Feil-Seifer D, Haring KS, Rossi S, Wagner AR, Williams T (2020) Where to next? The impact of Covid-19 on human–robot interaction research. ACM Trans Hum Robot Interact 10:1–7

    Article  Google Scholar 

  20. Gat E (1998) On three-layer architectures. In: Artificial intelligence and mobile robots. MIT Press

  21. Girard B, Filliat D, Meyer J-A, Berthoz A, Guillot A (2005) Integration of navigation and action selection functionalities in a computational model of cortico–basal ganglia–thalamo-cortical loops. Adapt Behav 13:2

    Article  Google Scholar 

  22. Griffith S, Subramanian K, Scholz J, Isbell CL, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. In: Advances in neural information processing systems, vol 26

  23. Grisetti G, Stachniss C, Burgard W (2007) Improved techniques for grid mapping with rao-blackwellized particle filters. Trans Robot 23(1):34–46. https://doi.org/10.1109/TRO.2006.889486

    Article  Google Scholar 

  24. Hafez MB, Weber C, Kerzel M, Wermter S (2019) Curious meta-controller: adaptive alternation between model-based and model-free control in deep reinforcement learning. In: 2019 international joint conference on neural networks (IJCNN). IEEE, pp 1–8

  25. Hangl S, Dunjko V, Briegel HJ, Piater J (2020) Skill learning by autonomous robotic playing using active learning and exploratory behavior composition. Front Robot AI 7:42

    Article  Google Scholar 

  26. Haruno M, Kawato M (2006) Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fmRI examination in stimulus-action-reward association learning. Neural Netw 19(8):1242–1254

    Article  MATH  Google Scholar 

  27. Hikosaka O, Nakahara H, Rand MK, Sakai K, Lu X, Nakamura K, Miyachi S, Doya K (1999) Parallel neural networks for learning sequential procedures. Trends Neurosci 22(10):464–471

    Article  Google Scholar 

  28. Ibarz J, Tan J, Finn C, Kalakrishnan M, Pastor P, Levine S (2021) How to train your robot with deep reinforcement learning: lessons we have learned. Intl J Robot Res 40(4–5):698–721

    Article  Google Scholar 

  29. Jauffret A, Cuperlier N, Gaussier P, Tarroux P (2013) From self-assessment to frustration, a small step toward autonomy in robotic navigation. Front Neurorobot 7:16

    Article  Google Scholar 

  30. Judah K, Roy S, Fern A, Dietterich T (2010) Reinforcement learning via practice and critique advice. In: Proceedings of the AAAI conference on artificial intelligence, vol 24, pp 481–486

  31. Justus D, Brennan J, Bonner S, McGough AS (2018) Predicting the computational cost of deep learning models. In: 2018 IEEE international conference on big data (Big Data). IEEE, pp 3873–3882

  32. Keramati M, Dezfouli A, Piray P (2011) Speed/accuracy trade-off between the habitual and goal-directed processes. PLoS Comput Biol 7(5):1–25

    Article  MathSciNet  Google Scholar 

  33. Khamassi M, Humphries MD (2012) Integrating cortico-limbic-basal ganglia architectures for learning model-based and model-free navigation strategies. Front Behav Neurosci 6:79

    Article  Google Scholar 

  34. Khamassi M, Wilson C, Rothé R, Quilodran R, Dominey PF, Procyk E (2011) Meta-learning, cognitive control, and physiological interactions between medial and lateral prefrontal cortex. In: Neural basis of motivational and cognitive control, pp 351–370

  35. Khamassi M, Velentzas G, Tsitsimis T, Tzafestas C (2018) Robot fast adaptation to changes in human engagement during simulated dynamic social interaction with active exploration in parameterized reinforcement learning. IEEE Trans Cognit Dev Syst 10(4):881–893

    Article  Google Scholar 

  36. Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the tamer framework. In: Proceedings of the fifth international conference on knowledge capture, pp 9–16

  37. Knox WB, Stone P (2012) Reinforcement learning from simultaneous human and mdp reward. In: AAMAS, pp 475–482

  38. Knox WB, Taylor ME, Stone P (2011) Understanding human teaching modalities in reinforcement learning environments: a preliminary report. In: IJCAI 2011 workshop on agents learning interactively from human teachers (ALIHT)

  39. Kober J, Bagnell AJ, Peters J (2013) Reinforcement learning in robotics: a survey. IJRR J 32(11):1238–1274. https://doi.org/10.1177/0278364913495721

    Article  Google Scholar 

  40. Koos S, Mouret J-B, Doncieux S (2012) The transferability approach: crossing the reality gap in evolutionary robotics. IEEE Trans Evol Comput 17(1):122–145

    Article  Google Scholar 

  41. Lee SW, Shimojo S, O’Doherty JP (2014) Neural computations underlying arbitration between model-based and model-free learning. Neuron 81(3):687–699

    Article  Google Scholar 

  42. Llofriu M, Tejera G, Contreras M, Pelc T, Fellous J-M, Weitzenfeld A (2015) Goal-oriented robot navigation learning using a multi-scale space representation. Neural Netw 72:62–74

    Article  Google Scholar 

  43. Lowrey K, Rajeswaran A, Kakade S, Todorov E, Mordatch I (2019) Plan online, learn offline: efficient learning and exploration via model-based control. In: International conference on learning representations

  44. Maffei G, Santos-Pata D, Marcos E, Sánchez-Fibla M, Verschure PFMJ (2015) An embodied biologically constrained model of foraging: from classical and operant conditioning to adaptive real-world behavior in dac-x. Neural Netw 72:88–108

    Article  Google Scholar 

  45. Meyer J-A, Guillot A (2008) Biologically-inspired robots. In: Siciliano B, Khatib O (eds) Handbook of robotics. Springer, Berlin, pp 1395–1422

    Chapter  Google Scholar 

  46. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  47. O’Doherty JP, Cockburn J, Pauli WM (2017) Learning, reward, and decision making. Ann Rev Psychol 68:73–100

    Article  Google Scholar 

  48. Pezzulo G, Rigoli F, Chersi F (2013) The mixed instrumental controller: using value of information to combine habitual choice and mental simulation. Front Psychol. https://doi.org/10.3389/fpsyg.2013.00092/abstract

    Article  Google Scholar 

  49. Powell T, Sammut-Bonnici T (2015) Pareto analysis. ISBN 9781118785317. https://doi.org/10.1002/9781118785317.weom120202

  50. Quigley M, Conley K, Gerkey BP, Faust J, Foote T, Leibs J, Wheeler R, Ng AY (2009) Ros: an open-source robot operating system. In: ICRA workshop on open source software

  51. Renaudo E, Girard B, Chatila R, Khamassi M (2014) Design of a control architecture for habit learning in robots. In: Biomimetic and biohybrid systems, LNAI proceedings, pp 249–260. https://doi.org/10.1007/978-3-319-09435-9_22

  52. Renaudo E, Devin S, Girard B, Chatila R, Alami R, Khamassi M, Clodic A (2015) Learning to interact with humans using goal-directed and habitual behaviors

  53. Renaudo E, Girard B, Chatila R, Khamassi M (2015) Which criteria for autonomously shifting between goal-directed and habitual behaviors in robots? In: 5th international conference on development and learning and on epigenetic robotics (ICDL-EPIROB), Providence, RI, USA, pp 254–260

  54. Renaudo E, Girard B, Chatila R, Khamassi M (2015) Respective advantages and disadvantages of model-based and model-free reinforcement learning in a robotics neuro-inspired cognitive architecture. In: Biologically inspired cognitive architectures BICA 2015, Lyon, France, pp 178–184

  55. Rojas-Castro DM, Revel A, Menard M (2020) Rhizome architecture: an adaptive neurobehavioral control architecture for cognitive mobile robots-application in a vision-based indoor robot navigation context. Int J Soc Robot 12(3):659–688

    Article  Google Scholar 

  56. Rutard F, Sigaud O, Chetouani M (2020) Tirl: enriching actor-critic rl with non-expert human teachers and a trust model. In: 2020 29th IEEE international conference on robot and human interactive communication (RO-MAN). IEEE, pp 604–611

  57. Sheikhnezhad Fard F, Trappenberg TP (2019) A novel model for arbitration between planning and habitual control systems. Front Neurorobot 13:52. https://doi.org/10.3389/fnbot.2019.00052

    Article  Google Scholar 

  58. Shenhav A, Botvinick MM, Cohen JD (2013) The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron 79(2):217–240

    Article  Google Scholar 

  59. Strubell E, Ganesh A, McCallum A (2019) Energy and policy considerations for deep learning in nlp. arXiv preprint arXiv:1906.02243

  60. Sutton RS, Barto AG (1998) Introduction to reinforcement learning, 1st edn. MIT Press, Cambridge

    MATH  Google Scholar 

  61. Van Der Meer M, Kurth-Nelson Z, Redish AD (2012) Information processing in decision-making systems. The Neuroscientist 18(4):342–359

    Article  Google Scholar 

  62. Viejo G, Khamassi M, Brovelli A, Girard B (2015) Modelling choice and reaction time during arbitrary visuomotor learning through the coordination of adaptive working memory and reinforcement learning. Front Behav Neurosci. https://doi.org/10.3389/fnbeh.2015.00225

    Article  Google Scholar 

  63. Wang JX, Kurth-Nelson Z, Kumaran D, Tirumala D, Soyer H, Leibo JZ, Hassabis D, Botvinick M (2018) Prefrontal cortex as a meta-reinforcement learning system. Nat Neurosci 21(6):860–868

    Article  Google Scholar 

  64. Wang T, Bao X, Clavera I, Hoang J, Wen Y, Langlois E, Zhang S, Zhang G, Abbeel P, Ba J (2019) Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057

  65. Zambelli M, Demiris Y (2016) Online multimodal ensemble learning using self-learned sensorimotor representations. IEEE Trans Cognit Dev Syst 9(2):113–126

    Article  Google Scholar 

  66. Zenon A, Solopchuk O, Pezzulo G (2019) An information-theoretic perspective on the costs of cognition. Neuropsychologia 123:5–18

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the Délégation Générale de l’Armement (ER, RD), by the Agence Nationale de la Recherche (ANR-12-CORD-0030 Roboergosum Project), by joint funding from ANR and the Austrian Science Fund FWF (ANR-21-CE33-0019-01), by the Centre National de la Recherche Scientifique (INS2I Appel Unique programme; MK), and by the European Union Horizon 2020 research and innovation programme under grant agreement No 761758 “HumanE-AI-Net” (H2020-ICT-48 Network of Centers of Excellence). The authors would like to thank Romain Retureau and Camille Lakhlifi for their help with some of the figures.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mehdi Khamassi.

Ethics declarations

Competing interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 6892 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dromnelle, R., Renaudo, E., Chetouani, M. et al. Reducing Computational Cost During Robot Navigation and Human–Robot Interaction with a Human-Inspired Reinforcement Learning Architecture. Int J of Soc Robotics 15, 1297–1323 (2023). https://doi.org/10.1007/s12369-022-00942-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12369-022-00942-6

Keywords

Navigation