SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning

Chester, Andrew; Dann, Michael; Zambetta, Fabio; Thangarajah, John

doi:10.1007/978-981-99-8391-9_22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14472))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

522 Accesses
1 Citations

Abstract

Model-based reinforcement learning algorithms are typically more sample efficient than their model-free counterparts, especially in sparse reward problems. Unfortunately, many interesting domains are too complex to specify complete models, and learning a model takes a large number of environment samples. If we could specify an incomplete model and allow the agent to learn how best to use it, we could take advantage of our partial understanding of many domains. In this work we propose SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: ICML (2017)
Google Scholar
Bagaria, A., Konidaris, G.: Option discovery using deep skill chaining. In: ICLR (2020)
Google Scholar
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR 13, 227–303 (2000)
Article MathSciNet Google Scholar
Fan, Z., Su, R., Zhang, W., Yu, Y.: Hybrid actor-critic reinforcement learning in parameterized action space. In: IJCAI (2019)
Google Scholar
François-Lavet, V., Bengio, Y., Precup, D., Pineau, J.: Combined reinforcement learning via abstract representations. In: AAAI (2019)
Google Scholar
Gopalan, N., et al.: Planning with abstract Markov decision processes. In: ICAPS (2017)
Google Scholar
Gordon, D., Fox, D., Farhadi, A.: What should i do now? Marrying reinforcement learning and symbolic planning. arXiv preprint arXiv:1901.01492 (2019)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: NeurIPS (2018)
Google Scholar
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Google Scholar
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: ICML (2019)
Google Scholar
Helmert, M.: The Fast Downward planning system. JAIR 26, 191–246 (2006)
Article Google Scholar
Illanes, L., Yan, X., Icarte, R.T., McIlraith, S.A.: Symbolic plans as high-level instructions for reinforcement learning. In: ICAPS (2020)
Google Scholar
Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: NeurIPS (2016)
Google Scholar
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. AIJ 241, 103–130 (2016)
MathSciNet Google Scholar
Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. In: ICLR (2019)
Google Scholar
Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: ICLR (2020)
Google Scholar
Lyu, D., Yang, F., Liu, B., Gustafson, S.: SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI (2019)
Google Scholar
McDermott, D., et al.: PDDL - the planning domain definition language. Technical report, Yale Center for Computational Vision and Control (1998)
Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: NeurIPS (2018)
Google Scholar
Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., Levine, S.: Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618 (2019)
Oh, J., Singh, S., Lee, H.: Value prediction network. In: NeurIPS (2017)
Google Scholar
Roderick, M., Grimm, C., Tellex, S.: Deep abstract Q-networks. In: AAMAS (2018)
Google Scholar
Scala, E., Haslum, P., Thiébaux, S.: Heuristics for numeric planning via subgoaling. In: IJCAI (2016)
Google Scholar
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML (2015)
Google Scholar
Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Article Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. AIJ 112(1–2), 181–211 (1999)
MathSciNet Google Scholar
Winder, J., et al.: Planning with abstract learned models while learning transferable subtasks. In: AAAI (2020)
Google Scholar
Yang, F., Lyu, D., Liu, B., Gustafson, S.: PEORL: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In: IJCAI (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing Technologies, RMIT University, Melbourne, Australia
Andrew Chester, Michael Dann, Fabio Zambetta & John Thangarajah

Authors

Andrew Chester
View author publications
You can also search for this author in PubMed Google Scholar
Michael Dann
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Zambetta
View author publications
You can also search for this author in PubMed Google Scholar
John Thangarajah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andrew Chester .

Editor information

Editors and Affiliations

The University of Sydney, Darlington, NSW, Australia
Tongliang Liu
Monash University, Clayton, VIC, Australia
Geoff Webb
The University of Newcastle, Callaghan, NSW, Australia
Lin Yue
CSIRO Data61, Sydney, NSW, Australia
Dadong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chester, A., Dann, M., Zambetta, F., Thangarajah, J. (2024). SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_22

Download citation

DOI: https://doi.org/10.1007/978-981-99-8391-9_22
Published: 27 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning