Research ArticleA Causal Role for the Right Dorsolateral Prefrontal Cortex in Avoidance of Risky Choices and Making Advantageous Selections
Graphical abstract
Introduction
Adaptive behaviour requires alternating and updating decision-making mechanisms. To this end, humans can directly act based on previous knowledge (exploitation) or try to avoid risky options by gaining additional decision time and accumulating additional information (exploration). Exploration and exploitation are two parallel cognitive strategies that compete and integrate information relevant to a new context (Berger-Tal et al., 2014). Exploration is often seen as a value-dependent stochastic decision rule, where motivational inputs such as incentive value or value sensitivity shape exploratory mechanisms and highly dependent on goal-directed mechanisms (Watson et al., 2018). If the agent is able to anticipate the correct optimal value given to a particular context, the optimal decision is the one that will maximise the outcome value. Given a good estimation of the upcoming reward, a good decision should be the one that most benefits the agent (exploitation). Mediated through trial and error, initial exploitation requires learning and establishing the correct value function.
Importantly, with critical changes within the context across time, the outcome value obtained from previous experiences will not be optimal. Updates in the outcome are needed and the agent should execute new trial and error parameters, i.e. decisions not optimal with actual outcome value (exploration). Reinforcement learning is based on the use of these two strategies which cannot be performed at once but sequentially, therefore recruiting multiple cognitive domains to generate optimal behaviour. Hence, to generate adaptive behaviour in risky and uncertain contexts, the use of complex cognitive computations shifting between exploration to exploitation options is crucial to maximise the probability of success and to adapt ongoing behaviours.
Multiple executive processes are required to orchestrate reinforcement-learning mechanisms in response to choice outcome. In new contexts, the exploration option entails larger risk, in contrast to constant contexts where no exploration is required. Exploitation thus relies on safer and well-learned contingencies that will generate less risk. Brain lesion studies (Floden et al., 2008) and neuroimaging (Li et al., 2010, Yu et al., 2014) have contributed much to investigating the brain-behaviour associations involved in executive control, risk avoidance and exploration in many reinforcement-learning tasks. The right dorsolateral prefrontal cortex (DLPFC) has been associated with several underlying functions that support exploration behaviours towards reward (Li et al., 2020), such as action updating to favourable choices (Morris et al., 2014) or when variability of value predictions is important (Kahnt et al., 2011). Additionally, recruitment of other regions such as the frontopolar cortex is required in explorative scenarios when arbitration is needed between multiple goals (Pollmann, 2016) or when one needs to plan a sequence of actions in order to gather useful information (Zajkowski et al., 2017). Meanwhile, where exploration tendencies and reward learning are demanded, increased activity of the left DLPFC suggests a role in overriding exploitation mechanisms and perseverative tendencies (Hare et al., 2014).
Importantly, DLPFC connectivity with the ventromedial prefrontal cortex (vmPFC) has been shown to compute stimulus value during decisions that required combined waiting strategies during exploitation (Hare et al., 2014, Nejati et al., 2018) and value computations (Sokol-Hessner et al., 2012). Recent results suggest the vmPFC execute decisions to explore or exploit with help from the DLPFC or ACC (Trudel et al., 2020), whereby DLPFC was recruited during positive uncertainty prediction during exploration.
Given neuroimaging studies do not provide direct functional relation to the observed behaviour, causal evidence (using TMS or tDCS) has proven informative on the roles of cortical regions in building exploit-exploratory preferences. Repetitive transcranial magnetic stimulation (rTMS) over cortical areas is a powerful tool to modulate cortical activity and make strong inferences about the causal involvement of brain areas in specific cognitive processes (Jahanshahi and Rothwell, 2000), similar to brain focal lesions (Floden et al., 2008, Obeso et al., 2014). Causal evidence (using TMS or tDCS) has proven somehow informative on the roles of cortical regions in building exploit-exploratory preferences. Experiments using direct or alternating current stimulation of the. DLPFC have shown different effects on risk behaviours. Some report only an effect of left DLPFC using the hot/cold Columbia Card Task, whereby anodal left/cathodal right stimulation decreased risk-taking in the ‘cold’ cognition version of the task (Pripfl et al., 2013) or Balloon Analogue Risk Task (Sela et al., 2012). Similar studies using repetitive TMS (rTMS) indicates the right (Knoch et al., 2006a, Knoch et al., 2006b, Tulviste and Bachmann, 2019) or left DLPFC (He et al., 2016) are both directly involved in shaping and controlling risky behaviours during motor actions. Overall, the above evidence suggests that the DLPFC is involved in learning from uncertain outcomes and decision-making, in both exploration and exploitation, but the exact nature of its computations and the causal attribution of left–right hemispheres is largely unknown.
The Iowa Gambling Task (IGT, Bechara et al, 1994) involves risky decision-making that is conducive to use of both exploitation and exploration strategies. Participants select cards from four different decks, with positive (advantageous deck) or negative (disadvantageous deck) consequences. They are told that each card is associated with either different quantities of reward or punishment and a total monetary amount will be collected based on their performance. Their objective is to earn as much money as possible throughout their choices and learning. Through trial and error, participants need to learn the best choice option amongst the four decks. Typical behaviour observed on task is selecting cards from each deck until a point where stability and outcome learning is reached. Through trial and error, participants need to learn the best choice option amongst the four decks. Exploration plays a pivotal and often overlooked role in maximising the final pay-off in the IGT where choice perseverance is a detrimental strategy. However, in the IGT, one critical aspect is that reward contingencies are fixed and this can lead participants to reduce exploration to increase monetary payoffs/outcomes by consolidation of exploitation behaviours. Healthy participants show adaptive learning and selection of adequate decks around trial 40–50 of the task (Steingroever et al., 2015), where varying degrees of attentional effort modulates learning from outcomes (Hawthorne and Pierce, 2015) while clinical samples show marked decision‐making deficits in learning reward (Stout et al., 2001, Shurman et al., 2005, Evens et al., 2016). Hence, the possibility of differentiating between reliance on exploration and exploitation was our motivation for selecting the IGT in the current study.
The IGT is a well-recognised task to obtain measures of learning and adaptive decision‐making. Yet, the underlying neurocognitive processes of exploratory and exploitation are difficult to unravel by using behavioural data alone. A new computational architecture (Value plus Sequential Exploration, VSE model; Ligneul, 2019) enabled us to distinguish two different forms of exploration while simultaneously accounting for value sensitivity and outcome (Wilson et al., 2014). Previous cognitive models using the IGT were designed to identify valence effects and possible perseverative patterns of behaviour. The Expectancy‐Valence Learning (EVL) model (Busemeyer and Stout, 2002) was used with Prospect Valence Learning model with Delta rule (PVL‐Delta) to obtain precise long‐term prediction accuracy and parameter recovery (Steingroever et al., 2013, Steingroever et al., 2014). Meanwhile, the Value‐Plus‐Perseverance model (VPP) shows excellent short‐term prediction accuracy (Ahn et al., 2014). In contrast, we aimed to disentangle value-independent directed exploration. This may capture a prevalent tendency of participants to sample options which are more uncertain and never been selected or which have not been selected for long. The new architecture has previously been proven to outperform alternative models in the field, using an independent dataset of 504 participants (Steingroever et al., 2015, Ligneul, 2019).
To test whether the DLPFC plays a causal role in exploratory strategy, we used continuous theta burst stimulation (cTBS) to temporarily interfere with the neural activity of this region in healthy participants and measured explorative-based learning using the IGT. Thanks to the independent tracking of exploitation and exploration processes, together with value sensitivity to choice outcome, the VSE model combined with TMS provides advanced mechanistic insights into the causal role of the DLPFC in controlling exploration. Hence, we hypothesised that the cTBS over the right DLPFC (compared to sham stimulation) would synergistically affect directed exploration and sensitivity to reinforcers, promoting exploitative behaviour in line with its role in updating behaviour towards favourable choices (Morris et al., 2014). In contrast, we predicted that cTBS over the left DLPFC would influence exploitation mechanisms due its role in overriding and controlling exploitation options (Hare et al., 2014).
Section snippets
Participants
Forty-two right-handed healthy Caucasian participants were paid to participate in the study. Participants were recruited through advertisement and were assessed at the UCL Queen Square Institute of Neurology by the same investigators (IO, MTH). None of the participants had a history of physical, neurological or psychiatric illness, drug, alcohol abuse or head injury. Prior to participation all were screened with the TMS safety questionnaire (Keel et al., 2001) and informed consent was obtained
Results
Groups did not differ in terms of age (F2, 31 = 1.60, p > .05), IQ (F2, 31 = 1.19, p > .05) or sex distribution (χ1 = 0.15, p > .05). No effects of personality or impulsivity traits were found on the results. One-way ANOVA on the BIS with Groups (right DLPFC vs. left DLPFC vs. Sham M1) revealed no main effects (F2, 31 = 0.59, p = .55). Similarly, one-way ANOVA on the measure of personality (Tridimensional Personality Questionnaire, TPQ) showed no differences between groups (right DLPFC vs. left
Discussion
This study reveals the specific computations of the DLPFC to successfully adapt behaviour in uncertain contexts. A combined non-invasive brain stimulation (cTBS) and model-based approach was used to uncover the role of the DLPFC (both right and left) in directed exploration on the IGT. As expected, a clear reduction of directed exploration was observed after right DLPFC stimulation and to a lesser extent after left DLPFC stimulation. Computational modelling showed that this decrease was driven
Acknowledgements
The authors would like to thank all the individuals who generously gave their time to participate in this study. IO was funded by Caja Madrid foundation. MTH acknowledges the funding support of both the University of Murcia for the payment of the participants in this study and the Fundación Séneca 09629/EE2/08, both without influence on study methodological characteristics.
Declarations of interest
None.
References (79)
- et al.
Rostrolateral prefrontal cortex and individual differences in uncertainty-driven exploration
Neuron
(2012) - et al.
Insensitivity to future consequences following damage to human prefrontal cortex
Cognition
(1994) - et al.
The contributions of lesion laterality and lesion volume to decision-making impairment following frontal lobe damage
Neuropsychologia
(2003) - et al.
Paradoxical effects of education on the Iowa Gambling Task
Brain Cogn
(2004) - et al.
Theta burst stimulation of the human motor cortex
Neuron
(2005) - et al.
A safety screening questionnaire for transcranial magnetic stimulation
Clin Neurophysiol
(2001) - et al.
A general mechanism for decision-making in the human brain?
Trends Cogn Sci
(2005) - et al.
Inter- and intra-subject variability of motor cortex plasticity following continuous theta-burst stimulation
Neuroscience
(2015) - et al.
Use and safety of a new repetitive transcranial magnetic stimulator
Electroencephalogr Clin Neurophysiol
(1996) - et al.
Decision-making in stimulant and opiate addicts in protracted abstinence: evidence from computational modeling with pure users
Front Psychol
(2014)