Research articleReinforcement learning in a bio-connectionist model based in the thalamo-cortical neural circuit
Introduction
Reinforcement Learning without Bio-connectionism is empty; Bio-connectionism without Reinforcement Learning is blind. We adapted the famous phrase from Immanuel Kant of his Critique of Pure Reason (see Kant & Guyer, 1998) to present a reinforcement learning design based on a model developed in a previous work. Back then, we presented a program to simulate a specific dynamic of the thalamo-cortical biological system which we named “configuration”. The method used was baptized bio-connectionism, and the configuration of the thalamo-cortical system was linked with animal perception (Chalita & Lis, 2015). There were some outstanding issues, such as the question regarding the possibility of extending the simulation of the whole brain, not just the primary sensory area. In this opportunity we present an extended system able to learn by trial and error from its own experience in an unpredictable environment, so that we suit our explanation in the area of the reinforcement learning (RL), a branch of machine learning that refers to the so called “hedonic” learning system (Mnih et al., 2015, Sutton and Barto, 1998). The application of RL in a bio-connectionist model has its antecedents in many previous studies that used RL to support hypotheses on the functioning of biological structures (see for example Chase et al., 2015, Clarke et al., 2014, Collins et al., 2014, Gläscher et al., 2010, Glimcher, 2011, Holroyd and Coles, 2002, Jitsev et al., 2014, Lucantonio et al., 2014, McDannald et al., 2011, Schultz et al., 1997, Senn and Pfister, 2014). In this opportunity we show a program based on the representation of not only the primary sensory area but also the main neocortical ones, and we also develop a virtual brain called the agent.
In a common RL program, there is a mechanism of reward and punishment without the need to specify how the task is to be achieved (Kaelbling, Littman, & Moore, 1996). Even in some cases there is not a task to be achieved, but just effects that the agent tends to avoid and effects such agent tends to maintain; in those cases we say the agent is just trying to survive, because we interpret the former are for some reason bad for it and the latter are good for it. The goodness of an action, in this sense, does not depend on a value function which estimates the future accumulated reward by taking such action (contrast with Rao, Bu, Xu, Wang, & Yin, 2009). This is in fact the only operation rule of the animal brain we want to reflect herein.
The name of reinforcement learning presents a dual problem: the agent must learn both how to recognize patterns and which of those patterns likes it and which ones do not like it. Pattern recognition, indeed, must be necessarily solved when taking over the last issue. This does not mean the reinforcement is delayed with respect to the recognition of the reinforced pattern; both are expected to be synchronous. The distinction between two aspects of learning occurs at the level of programming, of the algorithms necessary for the operation of the system, in short, it is about the conditions of learning and not about learning itself. We consider the problem of recognition of patterns already settled in our previous work (Chalita & Lis, 2015). In the program presented herein, we extend those results and explore the second issue: hedonism. Which is, briefly, the difference between recognition of patterns and what we call hedonism? In those terms, it is the same difference we detect between learning and the reinforcement of learning. So hedonism can be considered, according to the name of this subject, reinforcement in the recognition of patterns. We follow rigorously this definition of RL and show how in our design the reinforcement of learning brings about hedonic effects as a result.
The need to use a neural network framework to bring biology closer to RL is also manifested thoroughly by previous documentation (Fan et al., 2014, Faußer and Schwenker, 2015a, Faußer and Schwenker, 2015b, Hill et al., 1994, Lin and Lee, 1991, Nakano et al., 2015, Noel and Pandian, 2014, Senn and Pfister, 2014, Teng et al., 2014, Zhou et al., 2014 are some of them). The programmed brain is based on three paradigms synergistically combined: (1) artificial neural networks, (2) bio-connectionism and (3) reinforcement learning. The first one is the language with which the architecture of the agent is drawn; in this case, a recurrent neural network (Du and Swamy, 2014, Duell et al., 2012, Fausett, 1994). The second one supports the kernel conceptual frame from where it is possible to investigate the link between the agent and cognitive issues. The third one provides the opportunity to test the agent in its interaction with an environment and check their main expected effects according to the theory. The agent is kept as close as possible from biology through the three paradigms mentioned that are, in other words, (1) the biological shape that makes possible the other two ones: (2) the knowledge of the environment and (3) its relationship with it.
Section snippets
Bio-connectionist modeling
A multilayer feed-forward network architecture (Han et al., 2014, Hornik et al., 1989), such that the neurons of each layer project to neurons of another layer or layers allows a direct translation to the biological neural system considering a layer as a neural nucleus and, in our case, as a cortical layer as well. In order to differentiate cortical layers from the layer in the bio-connectionist design, the latter will be called station from now on. Programmed neurons are grouped in stations
Macroarchitecture
The program has two main modules: a world and a soma. The world is a video game-type world and the soma is the RL module composed of the brain and a translator device that connects the brain signals with the world signals (Fig. 4). The brain (or agent) is assigned to a character, which can be somebody into the world or the world itself (Cox and Dean, 2014, Raju et al., 2012), in which case we do not hesitate to say the whole video game world is itself the character of the agent. In any case,
Results
Three of the graphs developed to test the functioning of the brain are now described. It is not possible to show a graph corresponding to a complete successful process due its excessive length. Because of this, when operating the platform, once the value of the set of variables to the optimum functioning of the brain is obtained, the graphics system is disconnected to test the brain in the arrow world and without time limit (without a limited number of pulses). The first graph (Fig. 8a) is a
Interpretation of the results
All graphs and tables shown in the previous section were developed to test the configuration (i.e. the learning) process and the last one, the game-world where the agent applies what it learns, tested the hedony (i.e. the reinforcement of learning; see Section ‘Introduction’). An unfavorable environment activates through the soma (Section ‘Macroarchitecture’) the mechanism to prevent formation of synaptic memory, which is detrimental to the configuration, while a favorable environment promotes
Concluding remarks
RL without a bio-connectionism complement appears now to be empty and to produce an agent that shows results, but says nothing about the cognitive process that works as the condition of those results. Bio-connectionism without RL is also incomplete and presents a blind agent which resembles the myth of the ghost without its machine, a mind without a body – or a world. Despite the enormous difficulties of carrying out such a challenging program, we have tried to come upon the optimum results to
References (116)
- et al.
Neuro names brain hierarchy
Neuroimage
(1995) - et al.
An automated signalized junction controller that learns strategies from a human expert
Engineering Applications of Artificial Intelligence
(2012) - et al.
Neural networks and neuroscience-inspired computer vision
Current Biology
(2014) - et al.
The organization of corticothalamic projections: Reciprocity versus parity
Brain Research Reviews
(1998) - et al.
The projection of the auditory cortex upon the diencephalon and brain stem in the cat
Brain Research
(1969) On simple representations of stopping times and stopping time sigma-algebras
Statistics and Probability Letters
(2013)- et al.
Amygdala input monosynaptically innervates parvalbumin immunoreactive local circuit neurons in rat medial prefrontal cortex
Neuroscience
(2006) - et al.
States versus rewards: Dissociable neural prediction error signals underlying model-based and model-free reinforcement learning
Neuron
(2010) - et al.
Hierarchical extreme learning machine for feedforward neural network
Neurocomputing
(2014) - et al.
Motor maps and the cortical control of movement
Current Opinion in Neurobiology
(2014)
Artificial neural network models for forecasting and decision making
International Journal of Forecasting
Multilayer feedforward networks are universal approximators
Neural Networks
The thalamic matrix and thalamocortical synchrony
Trends in Neurosciences
Biological context of Hebb learning in artificial neural networks, a review
Neurocomputing
Associative memory models: From the cell-assembly theory to biophysically detailed cortex simulations
Trends in Neurosciences
Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: Involvement of the orbitofrontal cortex and dorsolateral striatum
Neuropharmacology
The importance of being hierarchical
Current Opinion in Neurobiology
Control of a nonlinear liquid level system using a new artificial neural network based reinforcement learning approach
Applied Soft Computing
Sparse coding of sensory inputs
Current Opinion in Neurobiology
Questioning the role of sparse coding in the brain
Trends in Neurosciences
S1 laminar specialization
Scholarpedia
State abstraction for programmable reinforcement learning agents
Adaptive control
Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action
Neuropsychopharmacology
Unraveling the multiscale structural organization and connectivity of the human brain: The role of diffusion MRI
Frontiers in Neuroanatomy
Long association fibers in cerebral hemispheres of monkey and chimpanzee
Journal of Neurophysiology
A Markovian decision process
Indiana University Mathematics Journal
Heaviside’s operational calculus as applied to engineering and physics
Model-based hierarchical reinforcement learning and human action control
Philosophical Transactions of the Royal Society B: Biological Sciences
Handling stochastic reward delays in machine reinforcement learning
Sign functions in natural and artificial systems
Bio-connectionist model based in the thalamo-cortical circuit
Cuadernos de Neuropsicologia
Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis
Cognitive, Affective, & Behavioral Neuroscience
Orbitofrontal dopamine depletion upregulates caudate dopamine and alters behavior via changes in reinforcement sensitivity
The Journal of Neuroscience
Working memory contributions to reinforcement learning impairments in schizophrenia
The Journal of Neuroscience
Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control
Nature Neuroscience
Habits, action sequences and reinforcement learning
European Journal of Neuroscience
Instrumental conditioning
Recurrent neural networks
Solving partially observable reinforcement learning problems with recurrent neural networks
Self-optimization of coverage and capacity based on a fuzzy neural network with cooperative reinforcement learning
EURASIP Journal on Wireless Communications and Networking
Neural network ensembles in reinforcement learning
Neural Processing Letters
Selective neural network ensembles in reinforcement learning: Taking the advantage of many agents
Neurocomputing
Fundamentals of neural networks
Computer
Distributed hierarchical processing in the primate cerebral cortex
Cerebral Cortex
Dopaminergic regulation of inhibitory and excitatory transmission in the basolateral amygdala–prefrontal cortical pathway
The Journal of neuroscience
Sparse coding
Scholarpedia
Sparse coding in the primate cortex
Head first design patterns
Cited by (3)
Beyond the frame problem: what (else) can Heidegger do for AI?
2023, AI and SocietyApplying a neural network architecture with spatio-temporal connections to the maze exploration
2018, Advances in Intelligent Systems and Computing