Abstract
The field of computational reinforcement learning (RL) has proved extremely useful in research on human and animal behavior and brain function. However, the simple forms of RL considered in most empirical research do not scale well, making their relevance to complex, real-world behavior unclear. In computational RL, one strategy for addressing the scaling problem is to introduce hierarchical structure, an approach that has intriguing parallels with human behavior. We have begun to investigate the potential relevance of hierarchical RL (HRL) to human and animal behavior and brain function. In the present chapter, we first review two results that show the existence of neural correlates to key predictions from HRL. Then, we focus on one aspect of this work, which deals with the question of how action hierarchies are initially established. Work in HRL suggests that hierarchy learning is accomplished by identifying useful subgoal states, and that this might in turn be accomplished through a structural analysis of the given task domain. We review results from a set of behavioral and neuroimaging experiments, in which we have investigated the relevance of these ideas to human learning and decision making.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
This particular result provides preliminary evidence for “model-based” hierarchical planning in the Diuk et al. (2012a) delivery task.
References
Aldridge, J. W., & Berridge, K. C. (1998). Coding of serial order by neostriatal neurons: a “natural action” approach to movement sequence. Journal of Neuroscience, 18(7), 2777–2787.
Badre, D. (2008). Cognitive control, hierarchy, and the rostro-caudal organization of the frontal lobes. Trends in Cognitive Sciences, 12(5), 193–200.
Baldassarre, G., & Mirolli, M. (Eds.), (2012). Intrinsically motivated learning in natural and artificial systems. Berlin: Springer.
Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4), 341–379.
Botvinick, M., & Plaut, D. C. (2004). Doing without schema hierarchies: a recurrent connectionist approach to normal and impaired routine sequential action. Psychological Review, 111(2), 395–429.
Botvinick, M. M., Niv, Y., Barto, A. C. (2009). Hierarchically organized behavior and its neural foundations: a reinforcement learning perspective. Cognition, 113(3), 262–280.
Bruner, J. (1975). Organization of early skilled action. Child Development, 44, 1–11.
Conway, C. M., & Christiansen, M. H. (2001). Sequential learning in non-human primates. Trends in Cognitive Sciences, 5(12), 539–546.
Cooper, R., & Shallice, T. (2000). Contention scheduling and the control of routine activities. Cognitive Neuropsychology, 17(4), 297–338.
Daw, N. D., Courville, A. C., Touretzky, D. S. (2003). Timing and partial observability in the dopamine system. In Advances in Neural Information Processing Systems (NIPS). Cambridge: MIT.
Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Advances in neural information processing systems 5 (pp. 271–278). San Mateo: Morgan Kaufmann.
Dietterich, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13, 227–303.
Diuk, C., Cordova, N., Niv, Y., Botvinick, M. (2012a). Discovering hierarchical task structure. Submitted.
Diuk, C., Tsai, K., Wallis, J., Niv, Y., Botvinick, M. M. (2012b). Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. The Journal of Neuroscience, 33(13), 5797–5805.
Elfwing, S., Uchibe, E., Doya, K., Christensen, H. I. (2007). Evolutionary development of hierarchical learning structures. IEEE Transactions on Evolutionary Computation, 11(2), 249–264.
Fischer, K. W. (1980). A theory of cognitive development: the control and construction of hierarchies of skills. Psychological Review, 87(6), 477–537
Fuster, J. M. (1997). The prefrontal cortex: anatomy, physiology, and neuropsychology of the frontal lobe, 3rd edn. Philadelphia: Lippincott-Raven.
Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural networks: the official journal of the international neural network society, 19(8), 1242–1254.
Hengst, B. (2002). Discovering hierarchy in reinforcement learning with HEXQ. In Proceedings of the 19th international conference on machine learning, Sydney, Australia.
Houk, J., Adams, J., Barto, A. (1995). A model of how the basal ganglia generate and use neural signals that predict reinforcement. In J. Houk, J. Davis, D. Beiser (Eds.), Models of information processing in the basal ganglia. Cambridge: MIT.
Ito, M., & Doya, K. (2011). Multiple representations and algorithms for reinforcement learning in the cortico-basal ganglia circuit. Current Opinion in Neurobiology, 21(3), 368–373.
Joel, D., Niv, Y., Ruppin, E. (2002). Actor—critic models of the basal ganglia: new anatomical and computational perspectives. Neural Networks, 15, 535–547.
Jonsson, A., & Barto, A. (2006). Causal graph based decomposition of factored MDPs. Journal of Machine Learning Research, 7, 2259–2301.
Koechlin, E., Ody, C., Kouneiher, F. (2003). The architecture of cognitive control in the human prefrontal cortex. Science (New York, N.Y.), 302(5648), 1181–1185.
Lashley, K. S. (1951). The problem of serial order in behavior. New York: Wiley
Li, L., Walsh, T. J., Littman, M. L. (2006). Towards a unified theory of state abstraction for MDPs. In Proceedings of the ninth international symposium on artificial intelligence and mathematics (AMAI-06).
McGovern, A., & Barto, A. G. (2001). Automatic discovery of subgoals in reinforcement learning using diverse density. Proceedings of the 18th international conference on machine learning.
Menache, I., Mannor, S., Shimkin, N. (2002). Q-cut-dynamic discovery of sub-goals in reinforcement learning. In European conference on machine learning (ECML 2002) (pp. 295–306).
Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202.
Miller, G. A., Galanter, E., Pribram, K. H. (1960). Plans and the structure of behavior. New York: Adams-Bannister-Cox
Montague, P. R., Dayan, P., Sejnowski, T. J. (1996). A framework for mesencephalic predictive hebbian learning. Journal of Neuroscience, 16(5), 1936–1947.
O’Doherty, J., Critchley, H., Deichmann, R., Dolan, R. J. (2003). Dissociating valence of outcome from behavioral control in human orbital and ventral prefrontal cortices. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 23(21), 7931–7939.
Opsahl, T., Agneessens, F., Skvoretz, J. (2010). Node centrality in weighted networks: generalizing degree and shortest paths. Social Networks, 32, 245–251.
Parr, R., & Russell, S. J. (1998). Reinforcement learning with hierarchies of machines. Advances in neural information processing systems.
Picket, M., & Barto, A. (2002). Policyblocks: An algorithm for creating useful macro-actions in reinforcement learning. In Proceedings of the 19th International conference on machine learning.
Ribas-Fernandes, J. J. F., Solway, A., Diuk, C., McGuire, J. T., Barto, A. G., Niv, Y., Botvinick, M. M. (2011). A neural signature of hierarchical reinforcement learning. Neuron, 71(2), 370–379.
Schank, R. C., & Abelson, R. P. (1977). Scripts, plans, goals, and understanding: an inquiry into human knowledge structures. Hillsdale: Lawrence Erlbaum.
Schapiro, A., Rogers, T., Cordova, N., Turk-Browne, N., Botvinick, M. (2013). Neural representations of events arise from temporal community structure. Nature Neuroscience, 16, 486–492.
Schembri, M., Mirolli, M., Baldassarre, G. (2007a). Evolution and learning in an intrinsically motivated reinforcement learning robot. In F. Almeida y Costa, L. M. Rocha, E. Costa, I. Harvey, A. Coutinho (Eds.), Advances in artificial life. Proceedings of the 9th European conference on artificial life. LNAI (vol. 4648, pp. 294–333). Berlin: Springer.
Schembri, M., Mirolli, M., Baldassarre, G. (2007b). Evolving internal reinforcers for an intrinsically motivated reinforcement-learning robot. In Y. Demiris, D. Mareschal, B. Scassellati, J. Weng (Eds.), Proceedings of the 6th international conference on development and learning (pp. E1–E6). London: Imperial College.
Schmidhuber, J. (1991a). A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the international conference on simulation of adaptive behavior: from animals to animats (pp. 222–227).
Schmidhuber, J. (1991b). Curious model-building control systems. Proceedings of the International Conference on Neural Networks, 2, 1458–1463.
Schneider, D. W. & Logan, G. D. (2006). Hierarchical control of cognitive processes: switching tasks in sequences. Journal of Experimental Psychology: General, 135(4), 623–640.
Schoenbaum, G., Chiba, A. A., Gallagher, M. (1999). Neural encoding in orbitofrontal cortex and basolateral amygdala during olfactory discrimination learning. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience, 19(5), 1876–84.
Schultz, W., Dayan, P., Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(March 1997), 1593–1599.
Schultz, W., Tremblay, L., Hollerman, J. R. (2000). Reward processing in primate orbitofrontal cortex and basal ganglia. Cerebral Cortex, 10(3), 272–84.
Şimşek, O. (2008). Behavioral building blocks for autonomous agents: description, identification, and learning. PhD thesis, University of Massachussetts, Amherst.
Şimşek, O., Barto, A. G. (2009). Skill Characterization Based on Betweenness. In D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Eds.), Advances in neural information processing systems 21 (pp. 1497–1504).
Şimşek, O., Wolfe, A. P., Barto, A. G. (2005). Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the twenty-second international conference on machine learning.
Singh, S., Barto, A., & Chentanez, N. (2005). Proceedings of Advances in Neural Information Processing Systems 17.
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT.
Sutton, R. S., Precup, D., Singh, S. (1999). Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181–211.
Thrun, S., & Schwartz, A. (1995). Finding structure in reinforcement learning. In G. Tesauro, D. Touretzky, T. Leen (Eds.), Advances in neural information processing systems (NIPS) 7. Cambridge: MIT.
Yamada, S., & Tsuji, S. (1989). Selective learning of macro-operators with perfect causality. In Proceedings of the 11th international joint conference on Artificial intelligence, Volume 1 (pp. 603–608), San Francisco: Morgan Kaufmann Publishers Inc.
Zacks, J. M., Speer, N. K., Swallow, K. M., Braver, T. S., Reynolds, J. R. (2007). Event perception: a mind-brain perspective. Psychological Bulletin, 133(2), 273–293.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Diuk, C., Schapiro, A., Córdova, N., Ribas-Fernandes, J., Niv, Y., Botvinick, M. (2013). Divide and Conquer: Hierarchical Reinforcement Learning and Task Decomposition in Humans. In: Baldassarre, G., Mirolli, M. (eds) Computational and Robotic Models of the Hierarchical Organization of Behavior. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39875-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-39875-9_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39874-2
Online ISBN: 978-3-642-39875-9
eBook Packages: Computer ScienceComputer Science (R0)