Research article
Reinforcement learning in a bio-connectionist model based in the thalamo-cortical neural circuit

https://doi.org/10.1016/j.bica.2016.03.001Get rights and content

Abstract

In a previous study, we presented a program to simulate a particular dynamic of the thalamo-cortical biological system. The method used was called bio-connectionism which linked the thalamo-cortical mechanism reproduced with animal perception. In this presentation, a reinforcement learning program is supported by this mechanism. In a game world designed to test the model developed, the agent is assigned to a character that must learn by trial and error from its own experience upon recognition of aversive and appetitive patterns. The results confirm, support and extend the notion of configuration, a term familiar with sparse coding principles. If, as it is documented, this mechanism observed in sensory areas can be thought as condition of perception, the brain areas taken together – each in its interaction with a respective sub-thalamic nucleus – are suspected to be considered as condition of cognition. We introduce some philosophical questions derived from the experimental results in the discussion section.

Introduction

Reinforcement Learning without Bio-connectionism is empty; Bio-connectionism without Reinforcement Learning is blind. We adapted the famous phrase from Immanuel Kant of his Critique of Pure Reason (see Kant & Guyer, 1998) to present a reinforcement learning design based on a model developed in a previous work. Back then, we presented a program to simulate a specific dynamic of the thalamo-cortical biological system which we named “configuration”. The method used was baptized bio-connectionism, and the configuration of the thalamo-cortical system was linked with animal perception (Chalita & Lis, 2015). There were some outstanding issues, such as the question regarding the possibility of extending the simulation of the whole brain, not just the primary sensory area. In this opportunity we present an extended system able to learn by trial and error from its own experience in an unpredictable environment, so that we suit our explanation in the area of the reinforcement learning (RL), a branch of machine learning that refers to the so called “hedonic” learning system (Mnih et al., 2015, Sutton and Barto, 1998). The application of RL in a bio-connectionist model has its antecedents in many previous studies that used RL to support hypotheses on the functioning of biological structures (see for example Chase et al., 2015, Clarke et al., 2014, Collins et al., 2014, Gläscher et al., 2010, Glimcher, 2011, Holroyd and Coles, 2002, Jitsev et al., 2014, Lucantonio et al., 2014, McDannald et al., 2011, Schultz et al., 1997, Senn and Pfister, 2014). In this opportunity we show a program based on the representation of not only the primary sensory area but also the main neocortical ones, and we also develop a virtual brain called the agent.

In a common RL program, there is a mechanism of reward and punishment without the need to specify how the task is to be achieved (Kaelbling, Littman, & Moore, 1996). Even in some cases there is not a task to be achieved, but just effects that the agent tends to avoid and effects such agent tends to maintain; in those cases we say the agent is just trying to survive, because we interpret the former are for some reason bad for it and the latter are good for it. The goodness of an action, in this sense, does not depend on a value function which estimates the future accumulated reward by taking such action (contrast with Rao, Bu, Xu, Wang, & Yin, 2009). This is in fact the only operation rule of the animal brain we want to reflect herein.

The name of reinforcement learning presents a dual problem: the agent must learn both how to recognize patterns and which of those patterns likes it and which ones do not like it. Pattern recognition, indeed, must be necessarily solved when taking over the last issue. This does not mean the reinforcement is delayed with respect to the recognition of the reinforced pattern; both are expected to be synchronous. The distinction between two aspects of learning occurs at the level of programming, of the algorithms necessary for the operation of the system, in short, it is about the conditions of learning and not about learning itself. We consider the problem of recognition of patterns already settled in our previous work (Chalita & Lis, 2015). In the program presented herein, we extend those results and explore the second issue: hedonism. Which is, briefly, the difference between recognition of patterns and what we call hedonism? In those terms, it is the same difference we detect between learning and the reinforcement of learning. So hedonism can be considered, according to the name of this subject, reinforcement in the recognition of patterns. We follow rigorously this definition of RL and show how in our design the reinforcement of learning brings about hedonic effects as a result.

The need to use a neural network framework to bring biology closer to RL is also manifested thoroughly by previous documentation (Fan et al., 2014, Faußer and Schwenker, 2015a, Faußer and Schwenker, 2015b, Hill et al., 1994, Lin and Lee, 1991, Nakano et al., 2015, Noel and Pandian, 2014, Senn and Pfister, 2014, Teng et al., 2014, Zhou et al., 2014 are some of them). The programmed brain is based on three paradigms synergistically combined: (1) artificial neural networks, (2) bio-connectionism and (3) reinforcement learning. The first one is the language with which the architecture of the agent is drawn; in this case, a recurrent neural network (Du and Swamy, 2014, Duell et al., 2012, Fausett, 1994). The second one supports the kernel conceptual frame from where it is possible to investigate the link between the agent and cognitive issues. The third one provides the opportunity to test the agent in its interaction with an environment and check their main expected effects according to the theory. The agent is kept as close as possible from biology through the three paradigms mentioned that are, in other words, (1) the biological shape that makes possible the other two ones: (2) the knowledge of the environment and (3) its relationship with it.

Section snippets

Bio-connectionist modeling

A multilayer feed-forward network architecture (Han et al., 2014, Hornik et al., 1989), such that the neurons of each layer project to neurons of another layer or layers allows a direct translation to the biological neural system considering a layer as a neural nucleus and, in our case, as a cortical layer as well. In order to differentiate cortical layers from the layer in the bio-connectionist design, the latter will be called station from now on. Programmed neurons are grouped in stations

Macroarchitecture

The program has two main modules: a world and a soma. The world is a video game-type world and the soma is the RL module composed of the brain and a translator device that connects the brain signals with the world signals (Fig. 4). The brain (or agent) is assigned to a character, which can be somebody into the world or the world itself (Cox and Dean, 2014, Raju et al., 2012), in which case we do not hesitate to say the whole video game world is itself the character of the agent. In any case,

Results

Three of the graphs developed to test the functioning of the brain are now described. It is not possible to show a graph corresponding to a complete successful process due its excessive length. Because of this, when operating the platform, once the value of the set of variables to the optimum functioning of the brain is obtained, the graphics system is disconnected to test the brain in the arrow world and without time limit (without a limited number of pulses). The first graph (Fig. 8a) is a

Interpretation of the results

All graphs and tables shown in the previous section were developed to test the configuration (i.e. the learning) process and the last one, the game-world where the agent applies what it learns, tested the hedony (i.e. the reinforcement of learning; see Section ‘Introduction’). An unfavorable environment activates through the soma (Section ‘Macroarchitecture’) the mechanism to prevent formation of synaptic memory, which is detrimental to the configuration, while a favorable environment promotes

Concluding remarks

RL without a bio-connectionism complement appears now to be empty and to produce an agent that shows results, but says nothing about the cognitive process that works as the condition of those results. Bio-connectionism without RL is also incomplete and presents a blind agent which resembles the myth of the ghost without its machine, a mind without a body – or a world. Despite the enormous difficulties of carrying out such a challenging program, we have tried to come upon the optimum results to

References (116)

  • T. Hill et al.

    Artificial neural network models for forecasting and decision making

    International Journal of Forecasting

    (1994)
  • K. Hornik et al.

    Multilayer feedforward networks are universal approximators

    Neural Networks

    (1989)
  • E.G. Jones

    The thalamic matrix and thalamocortical synchrony

    Trends in Neurosciences

    (2001)
  • E. Kuriscak et al.

    Biological context of Hebb learning in artificial neural networks, a review

    Neurocomputing

    (2015)
  • A. Lansner

    Associative memory models: From the cell-assembly theory to biophysically detailed cortex simulations

    Trends in Neurosciences

    (2009)
  • F. Lucantonio et al.

    Transition from ‘model-based’ to ‘model-free’ behavioral control in addiction: Involvement of the orbitofrontal cortex and dorsolateral striatum

    Neuropharmacology

    (2014)
  • N.T. Markov et al.

    The importance of being hierarchical

    Current Opinion in Neurobiology

    (2013)
  • M.M. Noel et al.

    Control of a nonlinear liquid level system using a new artificial neural network based reinforcement learning approach

    Applied Soft Computing

    (2014)
  • B.A. Olshausen et al.

    Sparse coding of sensory inputs

    Current Opinion in Neurobiology

    (2004)
  • A. Spanne et al.

    Questioning the role of sparse coding in the brain

    Trends in Neurosciences

    (2015)
  • E. Ahissar et al.

    S1 laminar specialization

    Scholarpedia

    (2010)
  • D. Andre et al.

    State abstraction for programmable reinforcement learning agents

  • K.J. Åström et al.

    Adaptive control

    (2013)
  • D. Aur et al.
    (2010)
  • B.W. Balleine et al.

    Human and rodent homologies in action control: Corticostriatal determinants of goal-directed and habitual action

    Neuropsychopharmacology

    (2010)
  • M. Bastiani et al.

    Unraveling the multiscale structural organization and connectivity of the human brain: The role of diffusion MRI

    Frontiers in Neuroanatomy

    (2015)
  • P. Bailey et al.

    Long association fibers in cerebral hemispheres of monkey and chimpanzee

    Journal of Neurophysiology

    (1943)
  • R. Bellman

    A Markovian decision process

    Indiana University Mathematics Journal

    (1957)
  • E.J. Berg

    Heaviside’s operational calculus as applied to engineering and physics

    (1936)
  • M. Botvinick et al.

    Model-based hierarchical reinforcement learning and human action control

    Philosophical Transactions of the Royal Society B: Biological Sciences

    (2014)
  • J.S. Campbell et al.

    Handling stochastic reward delays in machine reinforcement learning

  • P. Cariani

    Sign functions in natural and artificial systems

  • M.A. Chalita et al.

    Bio-connectionist model based in the thalamo-cortical circuit

    Cuadernos de Neuropsicologia

    (2015)
  • H.W. Chase et al.

    Reinforcement learning models and their neural correlates: An activation likelihood estimation meta-analysis

    Cognitive, Affective, & Behavioral Neuroscience

    (2015)
  • H.F. Clarke et al.

    Orbitofrontal dopamine depletion upregulates caudate dopamine and alters behavior via changes in reinforcement sensitivity

    The Journal of Neuroscience

    (2014)
  • A.G. Collins et al.

    Working memory contributions to reinforcement learning impairments in schizophrenia

    The Journal of Neuroscience

    (2014)
  • N.D. Daw et al.

    Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control

    Nature Neuroscience

    (2005)
  • A. Dezfouli et al.

    Habits, action sequences and reinforcement learning

    European Journal of Neuroscience

    (2012)
  • A. Dickinson

    Instrumental conditioning

  • K.L. Du et al.

    Recurrent neural networks

  • S. Duell et al.

    Solving partially observable reinforcement learning problems with recurrent neural networks

  • S. Fan et al.

    Self-optimization of coverage and capacity based on a fuzzy neural network with cooperative reinforcement learning

    EURASIP Journal on Wireless Communications and Networking

    (2014)
  • S. Faußer et al.

    Neural network ensembles in reinforcement learning

    Neural Processing Letters

    (2015)
  • S. Faußer et al.

    Selective neural network ensembles in reinforcement learning: Taking the advantage of many agents

    Neurocomputing

    (2015)
  • L. Fausett

    Fundamentals of neural networks

    Computer

    (1994)
  • D.J. Felleman et al.

    Distributed hierarchical processing in the primate cerebral cortex

    Cerebral Cortex

    (1991)
  • S.B. Floresco et al.

    Dopaminergic regulation of inhibitory and excitatory transmission in the basolateral amygdala–prefrontal cortical pathway

    The Journal of neuroscience

    (2007)
  • P. Foldiak et al.

    Sparse coding

    Scholarpedia

    (2008)
  • P. Földiák et al.

    Sparse coding in the primate cortex

  • E. Freeman et al.

    Head first design patterns

    (2004)
  • View full text