Abstract
Model-based reinforcement learning algorithms are typically more sample efficient than their model-free counterparts, especially in sparse reward problems. Unfortunately, many interesting domains are too complex to specify complete models, and learning a model takes a large number of environment samples. If we could specify an incomplete model and allow the agent to learn how best to use it, we could take advantage of our partial understanding of many domains. In this work we propose SAGE, an algorithm combining learning and planning to exploit a previously unusable class of incomplete models. This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andreas, J., Klein, D., Levine, S.: Modular multitask reinforcement learning with policy sketches. In: ICML (2017)
Bagaria, A., Konidaris, G.: Option discovery using deep skill chaining. In: ICLR (2020)
Dietterich, T.G.: Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR 13, 227–303 (2000)
Fan, Z., Su, R., Zhang, W., Yu, Y.: Hybrid actor-critic reinforcement learning in parameterized action space. In: IJCAI (2019)
François-Lavet, V., Bengio, Y., Precup, D., Pineau, J.: Combined reinforcement learning via abstract representations. In: AAAI (2019)
Gopalan, N., et al.: Planning with abstract Markov decision processes. In: ICAPS (2017)
Gordon, D., Fox, D., Farhadi, A.: What should i do now? Marrying reinforcement learning and symbolic planning. arXiv preprint arXiv:1901.01492 (2019)
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: NeurIPS (2018)
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML (2018)
Hafner, D., et al.: Learning latent dynamics for planning from pixels. In: ICML (2019)
Helmert, M.: The Fast Downward planning system. JAIR 26, 191–246 (2006)
Illanes, L., Yan, X., Icarte, R.T., McIlraith, S.A.: Symbolic plans as high-level instructions for reinforcement learning. In: ICAPS (2020)
Kulkarni, T.D., Narasimhan, K., Saeedi, A., Tenenbaum, J.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In: NeurIPS (2016)
Leonetti, M., Iocchi, L., Stone, P.: A synthesis of automated planning and reinforcement learning for efficient, robust decision-making. AIJ 241, 103–130 (2016)
Levy, A., Konidaris, G., Platt, R., Saenko, K.: Learning multi-level hierarchies with hindsight. In: ICLR (2019)
Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: ICLR (2020)
Lyu, D., Yang, F., Liu, B., Gustafson, S.: SDRL: interpretable and data-efficient deep reinforcement learning leveraging symbolic planning. In: AAAI (2019)
McDermott, D., et al.: PDDL - the planning domain definition language. Technical report, Yale Center for Computational Vision and Control (1998)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Nachum, O., Gu, S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: NeurIPS (2018)
Nachum, O., Tang, H., Lu, X., Gu, S., Lee, H., Levine, S.: Why does hierarchy (sometimes) work so well in reinforcement learning? arXiv preprint arXiv:1909.10618 (2019)
Oh, J., Singh, S., Lee, H.: Value prediction network. In: NeurIPS (2017)
Roderick, M., Grimm, C., Tellex, S.: Deep abstract Q-networks. In: AAMAS (2018)
Scala, E., Haslum, P., Thiébaux, S.: Heuristics for numeric planning via subgoaling. In: IJCAI (2016)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: ICML (2015)
Schrittwieser, J., et al.: Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839), 604–609 (2020)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Silver, D., et al.: A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362(6419), 1140–1144 (2018)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. AIJ 112(1–2), 181–211 (1999)
Winder, J., et al.: Planning with abstract learned models while learning transferable subtasks. In: AAAI (2020)
Yang, F., Lyu, D., Liu, B., Gustafson, S.: PEORL: Integrating symbolic planning and hierarchical reinforcement learning for robust decision-making. In: IJCAI (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Chester, A., Dann, M., Zambetta, F., Thangarajah, J. (2024). SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement Learning. In: Liu, T., Webb, G., Yue, L., Wang, D. (eds) AI 2023: Advances in Artificial Intelligence. AI 2023. Lecture Notes in Computer Science(), vol 14472. Springer, Singapore. https://doi.org/10.1007/978-981-99-8391-9_22
Download citation
DOI: https://doi.org/10.1007/978-981-99-8391-9_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8390-2
Online ISBN: 978-981-99-8391-9
eBook Packages: Computer ScienceComputer Science (R0)