Skip to main content
Log in

Persistent rule-based interactive reinforcement learning

  • S.I.: Human-in-the-loop Machine Learning and its Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Additionally, the information provided by each interaction is not retained and instead discarded by the agent after a single-use. In this work, we propose a persistent rule-based interactive reinforcement learning approach, i.e., a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant to more than just the current state. Our experimental results show persistent advice substantially improves the performance of the agent while reducing the number of interactions required for the trainer. Moreover, rule-based advice shows similar performance impact as state-based advice, but with a substantially reduced interaction count.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Arzate C, Igarashi T (2020) A survey on interactive reinforcement learning: design principles and open challenges. In: Proceedings of the 2020 ACM designing interactive systems conference. pp 1195–1209

  2. Lin J, Ma Z, Gomez R, Nakamura K, He B, Li G (2020) A review on interactive reinforcement learning from human social feedback. IEEE Access 8:120757–120765

    Article  Google Scholar 

  3. Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2020) Human engagement providing evaluative and informative advice for interactive reinforcement learning arXiv preprint arXiv:2009.09575

  4. Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: The TAMER framework. In: Proceedings of the fifth international conference on knowledge capture, pp. 9–16, ACM

  5. Bignold A, Cruz F, Taylor ME, Brys T, Dazeley R, Vamplew P, Foale C (2020) A conceptual framework for externally-influenced agents: an assisted reinforcement learning review, arXiv preprint arXiv:2007.01544

  6. Griffith S, Subramanian K, Scholz J, Isbell C, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. In: Advances in neural information processing systems. pp 2625–2633

  7. Knox WB, and Stone P (2010) Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1, pp 5–12

  8. Taylor ME, Carboni N, Fachantidis A, Vlahavas I, Torrey L (2014) Reinforcement learning agents providing advice in complex video games. Connect Sci 26(1):45–63

    Article  Google Scholar 

  9. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge

    MATH  Google Scholar 

  10. Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken

    Book  MATH  Google Scholar 

  11. Sledge IJ, Príncipe JC (2017) Balancing exploration and exploitation in reinforcement learning using a value of information criterion. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2816–2820

  12. Subramanian K, Isbell CL Jr, Thomaz AL (2016) Exploration from demonstration for interactive reinforcement learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems, pp 447–456

  13. Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human-robot environment. Appl Sci 10(16):5574

    Article  Google Scholar 

  14. Thomaz AL, Hoffman G, Breazeal C (2005) Real-time interactive reinforcement learning for robots. In: AAAI 2005 workshop on human comprehensible machine learning

  15. Ayala A, Henríquez C, Cruz F (2019) Reinforcement learning using continuous states and interactive feedback. In: Proceedings of the international conference on applications of intelligent systems, pp 1–5

  16. Millán C, Fernandes B, Cruz F (2019) Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning ESANN, pp 661–666, ESANN

  17. Pilarski PM, and Sutton RS (2012) Between instruction and reward: human-prompted switching. In: AAAI fall symposium series: robots learning interactively from human teachers, pp 45–52

  18. Cruz F, Wüppen P, Magg S, Fazrie A, Wermter S (2017) Agent-advising approaches in an interactive reinforcement learning scenario. In: Proceedings of the joint IEEE international conference on development and learning and epigenetic robotics ICDL-EpiRob, pp 209–214, IEEE

  19. Torrey L, Taylor ME (2013) Teaching on a budget: agents advising agents in reinforcement learning, In: Proceedings of the international conference on autonomous agents and multiagent systems AAMAS

  20. López G, Quesada L, Guerrero LA (2017) Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International conference on applied human factors and ergonomics, pp 241–250, Springer

  21. Churamani N, Cruz F, Griffiths S, and Barros P (2016) iCub: learning emotion expressions using human reward. In: Proceedings of the workshop on bio-inspired social robot learning in home scenarios. IEEE/RSJ IROS, p 2

  22. Kwok SW, Carter C (1990) Multiple decision trees. In: Machine intelligence and pattern recognition, vol 9. Elsevier, pp 327–335

  23. Rokach L, Maimon O (2005) Decision trees, in data mining and knowledge discovery handbook. Springer, pp 165–192

  24. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Article  Google Scholar 

  25. Breiman L (2017) Classification and regression trees. Routledge, Milton Park

    Book  Google Scholar 

  26. Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1):7–52

    Article  MATH  Google Scholar 

  27. Li R, Jabri A, Darrell T, Agrawal P (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: IEEE international conference on robotics and automation, pp 4051–4058

  28. Tadepalli P, Givan R, Driessens K (2004) Relational reinforcement learning: an overview. In: Proceedings of the ICML-2004 workshop on relational reinforcement learning, pp 1–9

  29. Glatt R, Da Silva FL, da Costa Bianchi RA, Costa AHR (2020) DECAF: deep case-based policy inference for knowledge transfer in reinforcement learning. Expert Syst Appl 156:113420

    Article  Google Scholar 

  30. Bianchi RA, Ros R, De Mantaras RL (2009) Improving reinforcement learning by using case based heuristics. In: International conference on case-based reasoning. Springer, pp 75–89

  31. Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(7):1633–1685

    MathSciNet  MATH  Google Scholar 

  32. Bianchi RA, Celiberto LA Jr, Santos PE, Matsuura JP, de Mantaras RL (2015) Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artif Intell 226:102–121

    Article  MathSciNet  MATH  Google Scholar 

  33. Kang B, Compton P, and Preston P (1995) Multiple classification ripple down rules: evaluation and possibilities. In: Proceedings 9th Banff knowledge acquisition for knowledge-based systems workshop, vol 1, pp 17–1

  34. Compton P, Edwards G, Kang B, Lazarus L, Malor R, Menzies T, Preston P, Srinivasan A, Sammut C (1991) Ripple down rules: possibilities and limitations. In: Proceedings of the sixth AAAI knowledge acquisition for knowledge-based systems workshop. University of Calgary, Calgary, Canada, pp 6–1

  35. Herbert D, Kang BH (2018) Intelligent conversation system using multiple classification ripple down rules and conversational context. Expert Syst Appl 112:342–352

    Article  Google Scholar 

  36. Richards D (2009) Two decades of ripple down rules research. Knowl Eng Rev 24(2):159–184

    Article  Google Scholar 

  37. Randløv J and Alstrøm P (1988) Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol 98, pp 463–471, Citeseer

  38. Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. Proc. Int. Conf. Mach. Learn. ICML 99:278–287

    Google Scholar 

  39. Devlin S and Kudenko D (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-vol 1, pp 225–232

  40. Harutyunyan A, Devlin S, Vrancxn P, Nowé A (2015) Expressing arbitrary reward functions as potential-based advice.. In: AAAI, pp 2652–2658

  41. Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent, in Proceedings of the fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems. pp 720–727

  42. Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) An evaluation methodology for interactive reinforcement learning with simulated users. Biomimetics 6(1):13

    Article  Google Scholar 

  43. Kang BH, Preston P, Compton P, (1998) Simulated expert evaluation of multiple classification ripple down rules. In: Proceedings of the 11th workshop on knowledge acquisition, modeling and management

  44. Compton P, Preston P, Kang B (1995) The use of simulated experts in evaluating knowledge acquisition. University of Calgary, Calgary

    Google Scholar 

  45. Gaines BR, Compton P (1995) Induction of ripple-down rules applied to modeling large databases. J Intell Inf Syst 5(3):211–228

    Article  Google Scholar 

  46. Compton P, Peters L, Edwards G, Lavers TG (2006) Experience with ripple-down rules. Applications and innovations in intelligent systems XIII. Springer, pp 109–121

Download references

Acknowledgments

This work has been partially supported by the Australian Government Research Training Program (RTP) and the RTP Fee-Offset Scholarship through Federation University Australia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Cruz.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bignold, A., Cruz, F., Dazeley, R. et al. Persistent rule-based interactive reinforcement learning. Neural Comput & Applic 35, 23411–23428 (2023). https://doi.org/10.1007/s00521-021-06466-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06466-w

Keywords

Navigation