Elsevier

Neuroscience

Volume 458, 15 March 2021, Pages 166-179
Neuroscience

Research Article
A Causal Role for the Right Dorsolateral Prefrontal Cortex in Avoidance of Risky Choices and Making Advantageous Selections

https://doi.org/10.1016/j.neuroscience.2020.12.035Get rights and content

Highlights

  • Using theta burst stimulation we show the DLPFC as causally involved in risky decision-making.

  • Inhibitory theta burst protocols over the right and left DLPFC decreased explorative behaviour.

  • Inhibitory theta burst protocols over the right DLPFC enhanced sensitivity to reward value, resulting in risk avoidance.

  • The right DLPFC directs behaviour to avoid risky choices and adopt more cautious but beneficial choices.

Abstract

In everyday life, risky decision-making relies on multiple cognitive processes including sensitivity to reinforcers, exploration, learning, and forgetting. Neuroimaging evidence suggests that the dorsolateral prefrontal cortex (DLPFC) is involved in exploration and risky decision-making, but the nature of its computations and its causal role remain uncertain. We provide evidence for the role of the DLPFC in value-independent, directed exploration on the Iowa Gambling Task (IGT) and we describe a new computational model to account for the competition of directed exploration and exploitation in guiding decisions. Forty-two healthy human participants were included in a right DLPFC, left DLPFC or sham stimulation groups using continuous theta-burst stimulation (cTBS). Immediately after cTBS, the IGT was completed. Computational modelling was used to account for exploration and exploitation with different combinations with value-based and sensitivity to reinforcers for each group. Applying cTBS to the left and right DLPFC selectively decreased directed exploration on the IGT compared to sham stimulation. Model-based analyses further indicated that the right (but not the left) DLPFC stimulation increased sensitivity to reinforcers, leading to avoidance of risky choices and promoting advantageous choices during the task. Although these findings are based on small sample sizes per group, they nevertheless elucidate the causal role of the right DLPFC in governing the exploration–exploitation tradeoff during decision-making in uncertain and ambiguous contexts.

Introduction

Adaptive behaviour requires alternating and updating decision-making mechanisms. To this end, humans can directly act based on previous knowledge (exploitation) or try to avoid risky options by gaining additional decision time and accumulating additional information (exploration). Exploration and exploitation are two parallel cognitive strategies that compete and integrate information relevant to a new context (Berger-Tal et al., 2014). Exploration is often seen as a value-dependent stochastic decision rule, where motivational inputs such as incentive value or value sensitivity shape exploratory mechanisms and highly dependent on goal-directed mechanisms (Watson et al., 2018). If the agent is able to anticipate the correct optimal value given to a particular context, the optimal decision is the one that will maximise the outcome value. Given a good estimation of the upcoming reward, a good decision should be the one that most benefits the agent (exploitation). Mediated through trial and error, initial exploitation requires learning and establishing the correct value function.

Importantly, with critical changes within the context across time, the outcome value obtained from previous experiences will not be optimal. Updates in the outcome are needed and the agent should execute new trial and error parameters, i.e. decisions not optimal with actual outcome value (exploration). Reinforcement learning is based on the use of these two strategies which cannot be performed at once but sequentially, therefore recruiting multiple cognitive domains to generate optimal behaviour. Hence, to generate adaptive behaviour in risky and uncertain contexts, the use of complex cognitive computations shifting between exploration to exploitation options is crucial to maximise the probability of success and to adapt ongoing behaviours.

Multiple executive processes are required to orchestrate reinforcement-learning mechanisms in response to choice outcome. In new contexts, the exploration option entails larger risk, in contrast to constant contexts where no exploration is required. Exploitation thus relies on safer and well-learned contingencies that will generate less risk. Brain lesion studies (Floden et al., 2008) and neuroimaging (Li et al., 2010, Yu et al., 2014) have contributed much to investigating the brain-behaviour associations involved in executive control, risk avoidance and exploration in many reinforcement-learning tasks. The right dorsolateral prefrontal cortex (DLPFC) has been associated with several underlying functions that support exploration behaviours towards reward (Li et al., 2020), such as action updating to favourable choices (Morris et al., 2014) or when variability of value predictions is important (Kahnt et al., 2011). Additionally, recruitment of other regions such as the frontopolar cortex is required in explorative scenarios when arbitration is needed between multiple goals (Pollmann, 2016) or when one needs to plan a sequence of actions in order to gather useful information (Zajkowski et al., 2017). Meanwhile, where exploration tendencies and reward learning are demanded, increased activity of the left DLPFC suggests a role in overriding exploitation mechanisms and perseverative tendencies (Hare et al., 2014).

Importantly, DLPFC connectivity with the ventromedial prefrontal cortex (vmPFC) has been shown to compute stimulus value during decisions that required combined waiting strategies during exploitation (Hare et al., 2014, Nejati et al., 2018) and value computations (Sokol-Hessner et al., 2012). Recent results suggest the vmPFC execute decisions to explore or exploit with help from the DLPFC or ACC (Trudel et al., 2020), whereby DLPFC was recruited during positive uncertainty prediction during exploration.

Given neuroimaging studies do not provide direct functional relation to the observed behaviour, causal evidence (using TMS or tDCS) has proven informative on the roles of cortical regions in building exploit-exploratory preferences. Repetitive transcranial magnetic stimulation (rTMS) over cortical areas is a powerful tool to modulate cortical activity and make strong inferences about the causal involvement of brain areas in specific cognitive processes (Jahanshahi and Rothwell, 2000), similar to brain focal lesions (Floden et al., 2008, Obeso et al., 2014). Causal evidence (using TMS or tDCS) has proven somehow informative on the roles of cortical regions in building exploit-exploratory preferences. Experiments using direct or alternating current stimulation of the. DLPFC have shown different effects on risk behaviours. Some report only an effect of left DLPFC using the hot/cold Columbia Card Task, whereby anodal left/cathodal right stimulation decreased risk-taking in the ‘cold’ cognition version of the task (Pripfl et al., 2013) or Balloon Analogue Risk Task (Sela et al., 2012). Similar studies using repetitive TMS (rTMS) indicates the right (Knoch et al., 2006a, Knoch et al., 2006b, Tulviste and Bachmann, 2019) or left DLPFC (He et al., 2016) are both directly involved in shaping and controlling risky behaviours during motor actions. Overall, the above evidence suggests that the DLPFC is involved in learning from uncertain outcomes and decision-making, in both exploration and exploitation, but the exact nature of its computations and the causal attribution of left–right hemispheres is largely unknown.

The Iowa Gambling Task (IGT, Bechara et al, 1994) involves risky decision-making that is conducive to use of both exploitation and exploration strategies. Participants select cards from four different decks, with positive (advantageous deck) or negative (disadvantageous deck) consequences. They are told that each card is associated with either different quantities of reward or punishment and a total monetary amount will be collected based on their performance. Their objective is to earn as much money as possible throughout their choices and learning. Through trial and error, participants need to learn the best choice option amongst the four decks. Typical behaviour observed on task is selecting cards from each deck until a point where stability and outcome learning is reached. Through trial and error, participants need to learn the best choice option amongst the four decks. Exploration plays a pivotal and often overlooked role in maximising the final pay-off in the IGT where choice perseverance is a detrimental strategy. However, in the IGT, one critical aspect is that reward contingencies are fixed and this can lead participants to reduce exploration to increase monetary payoffs/outcomes by consolidation of exploitation behaviours. Healthy participants show adaptive learning and selection of adequate decks around trial 40–50 of the task (Steingroever et al., 2015), where varying degrees of attentional effort modulates learning from outcomes (Hawthorne and Pierce, 2015) while clinical samples show marked decision‐making deficits in learning reward (Stout et al., 2001, Shurman et al., 2005, Evens et al., 2016). Hence, the possibility of differentiating between reliance on exploration and exploitation was our motivation for selecting the IGT in the current study.

The IGT is a well-recognised task to obtain measures of learning and adaptive decision‐making. Yet, the underlying neurocognitive processes of exploratory and exploitation are difficult to unravel by using behavioural data alone. A new computational architecture (Value plus Sequential Exploration, VSE model; Ligneul, 2019) enabled us to distinguish two different forms of exploration while simultaneously accounting for value sensitivity and outcome (Wilson et al., 2014). Previous cognitive models using the IGT were designed to identify valence effects and possible perseverative patterns of behaviour. The Expectancy‐Valence Learning (EVL) model (Busemeyer and Stout, 2002) was used with Prospect Valence Learning model with Delta rule (PVL‐Delta) to obtain precise long‐term prediction accuracy and parameter recovery (Steingroever et al., 2013, Steingroever et al., 2014). Meanwhile, the Value‐Plus‐Perseverance model (VPP) shows excellent short‐term prediction accuracy (Ahn et al., 2014). In contrast, we aimed to disentangle value-independent directed exploration. This may capture a prevalent tendency of participants to sample options which are more uncertain and never been selected or which have not been selected for long. The new architecture has previously been proven to outperform alternative models in the field, using an independent dataset of 504 participants (Steingroever et al., 2015, Ligneul, 2019).

To test whether the DLPFC plays a causal role in exploratory strategy, we used continuous theta burst stimulation (cTBS) to temporarily interfere with the neural activity of this region in healthy participants and measured explorative-based learning using the IGT. Thanks to the independent tracking of exploitation and exploration processes, together with value sensitivity to choice outcome, the VSE model combined with TMS provides advanced mechanistic insights into the causal role of the DLPFC in controlling exploration. Hence, we hypothesised that the cTBS over the right DLPFC (compared to sham stimulation) would synergistically affect directed exploration and sensitivity to reinforcers, promoting exploitative behaviour in line with its role in updating behaviour towards favourable choices (Morris et al., 2014). In contrast, we predicted that cTBS over the left DLPFC would influence exploitation mechanisms due its role in overriding and controlling exploitation options (Hare et al., 2014).

Section snippets

Participants

Forty-two right-handed healthy Caucasian participants were paid to participate in the study. Participants were recruited through advertisement and were assessed at the UCL Queen Square Institute of Neurology by the same investigators (IO, MTH). None of the participants had a history of physical, neurological or psychiatric illness, drug, alcohol abuse or head injury. Prior to participation all were screened with the TMS safety questionnaire (Keel et al., 2001) and informed consent was obtained

Results

Groups did not differ in terms of age (F2, 31 = 1.60, p > .05), IQ (F2, 31 = 1.19, p > .05) or sex distribution (χ1 = 0.15, p > .05). No effects of personality or impulsivity traits were found on the results. One-way ANOVA on the BIS with Groups (right DLPFC vs. left DLPFC vs. Sham M1) revealed no main effects (F2, 31 = 0.59, p = .55). Similarly, one-way ANOVA on the measure of personality (Tridimensional Personality Questionnaire, TPQ) showed no differences between groups (right DLPFC vs. left

Discussion

This study reveals the specific computations of the DLPFC to successfully adapt behaviour in uncertain contexts. A combined non-invasive brain stimulation (cTBS) and model-based approach was used to uncover the role of the DLPFC (both right and left) in directed exploration on the IGT. As expected, a clear reduction of directed exploration was observed after right DLPFC stimulation and to a lesser extent after left DLPFC stimulation. Computational modelling showed that this decrease was driven

Acknowledgements

The authors would like to thank all the individuals who generously gave their time to participate in this study. IO was funded by Caja Madrid foundation. MTH acknowledges the funding support of both the University of Murcia for the payment of the participants in this study and the Fundación Séneca 09629/EE2/08, both without influence on study methodological characteristics.

Declarations of interest

None.

References (79)

  • R.M. Alderson et al.

    Attention-deficit/hyperactivity disorder and behavioral inhibition: a meta-analytic review of the stop-signal paradigm

    J Abnorm Child Psychol

    (2007)
  • D.J. Barraclough et al.

    Prefrontal cortex and decision making in a mixed-strategy game

    Nat Neurosci

    (2004)
  • A. Bechara

    Decision making, impulse control and loss of willpower to resist drugs: a neurocognitive perspective

    Nat Neurosci

    (2005)
  • S. Bembich et al.

    Differences in time course activation of dorsolateral prefrontal cortex associated with low or high risk choices in a gambling task

    Front Hum Neurosci

    (2014)
  • O. Berger-Tal et al.

    The exploration-exploitation dilemma: a multidisciplinary framework. Daunizeau J, ed.

    PLoS One

    (2014)
  • K.I. Bolla et al.

    Sex-related differences in a gambling task and its neurological correlates

    Cereb Cortex

    (2004)
  • Boorman ED, Rajendran VG, O’Reilly JX, Behrens TE (2016) Two anatomically and computationally distinct learning signals...
  • Brevers D, Bechara A, Cleeremans A, Noël X (2013) Iowa Gambling Task (IGT): twenty years after - gambling disorder and...
  • Busemeyer JR, Stout JC (2002) A contribution of cognitive decision models to clinical assessment: Decomposing...
  • M. Camus et al.

    Repetitive transcranial magnetic stimulation over the right dorsolateral prefrontal cortex decreases valuations during food choices

    Eur J Neurosci

    (2009)
  • Chien JH, Colloca L, Korzeniewska A, Cheng JJ, Campbell CM, Hillis AE, Lenz FA (2017) Oscillatory EEG activity induced...
  • Clark L, Boileau I, Zack M (2019) Neuroimaging of reward mechanisms in Gambling disorder: an integrative review. Mol...
  • C.R. Cloninger et al.

    The Tridimensional Personality Questionnaire: U.S. normative data

    Psychol Rep

    (1991)
  • Collins AGE, Frank MJ (2014) Opponent actor learning (OpAL): Modeling interactive effects of striatal dopamine on...
  • Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans....
  • Diana M, Raij T, Melis M, Nummenmaa A, Leggio L, Bonci A (2017) Rehabilitating the addicted brain with transcranial...
  • Enokibara M, Trevizol A, Shiozawa P, Cordeiro Q (2016) Establishing an effective TMS protocol for craving in substance...
  • Evens R, Hoefler M, Biber K, Lueken U (2016) The Iowa Gambling Task in Parkinson’s disease: A meta-analysis on effects...
  • S. Fecteau et al.

    Diminishing risk-taking behavior by modulating activity in the prefrontal cortex: a direct current stimulation study

    J Neurosci

    (2007)
  • Floden D, Alexander MP, Kubu CS, Katz D, Stuss DT (2008) Impulsivity and risk-taking behavior in focal frontal lobe...
  • L.R. Gianotti et al.

    Tonic activity level in the right prefrontal cortex predicts individuals’ risk taking

    Psychol Sci

    (2009)
  • HajiHosseini A, Holroyd CB (2015a) Reward feedback stimuli elicit high-beta EEG oscillations in human dorsolateral...
  • HajiHosseini A, Holroyd CB (2015b) Reward feedback stimuli elicit high-beta EEG oscillations in human dorsolateral...
  • HajiHosseini A, Holroyd CB (2015c) Sensitivity of frontal beta oscillations to reward valence but not probability....
  • Hare TA, Hakimi S, Rangel A (2014) Activity in dlPFC and its effective connectivity to vmPFC are associated with...
  • Hawthorne MJ, Pierce BH (2015) Disadvantageous Deck Selection in the Iowa Gambling Task: The Effect of Cognitive Load....
  • He Q, Chen M, Chen C, Xue G, Feng T, Bechara A (2016) Anodal Stimulation of the Left DLPFC Increases IGT Scores and...
  • H.R. Heekeren et al.

    A general mechanism for perceptual decision-making in the human brain

    Nature

    (2004)
  • M. Jahanshahi et al.

    Transcranial magnetic stimulation studies of cognition: an emerging field

    Exp Brain Res

    (2000)
  • Cited by (0)

    View full text