Computational theory-driven studies of reinforcement learning and decision-making in addiction: what have we learned?

doi:10.1016/j.cobeha.2020.08.007

Current Opinion in Behavioral Sciences

Volume 38, April 2021, Pages 40-48

https://doi.org/10.1016/j.cobeha.2020.08.007 Get rights and content

Highlights

•
Computational psychiatry holds promise for mechanistic discovery in addiction.
•
This approach captures latent factors driving behavioral differences from health.
•
Emerging support also for capturing variation defining addiction cycles and states.
•
Research needs to better account for the heterogeneous, dynamic nature of addiction.
•
Expanding the parameter space examined and duration of observation will be key.

Computational psychiatry provides a powerful new approach for linking the behavioral manifestations of addiction to their precise cognitive and neurobiological substrates. However, this emerging area of research is still limited in important ways. While research has identified features of reinforcement learning and decision-making in substance users that differ from health, less emphasis has been placed on capturing addiction cycles/states dynamically, within-person. In addition, the focus on few behavioral variables at a time has precluded more detailed consideration of related processes and heterogeneous clinical profiles. We propose that a longitudinal and multidimensional examination of value-based processes, a type of dynamic ‘computational fingerprint’, will provide a more complete understanding of addiction as well as aid in developing better tailored and timed interventions.

Introduction

Reinforcement learning and decision-making — collectively, ‘value-based decision-making’ [1] — are integral to adaptive behavior in everyday life. Value-based decision-making comprises a feedback loop whereby the values of candidate actions are learned and updated through experience, and used to guide behavior that maximizes utility (and minimizes disutility). Disruption in value-based decision-making is considered a key factor in the development and maintenance of addiction [2, 3, 4], across people with substance use disorders (SUD) [5] and laboratory animals exposed to drugs of abuse [6,7], but the specific contributing mechanisms remain unknown. Decision-making biases in addiction may be due to disruption in distinct components of learning, such as error encoding or value updating, or subjective preferences that are not readily observable in coarse behavioral performance measures. The nascent field of computational psychiatry applies formal models to understand the precise mechanisms (or ‘failure modes’) that give rise to pathological behavior in psychiatric conditions [8,9,10^••]. While there is no consensus on what qualifies as computational psychiatry, here we take this term to mean a mathematically rigorous understanding of the latent drivers of behavior. Findings from theory-driven computational psychiatry [11] suggest models that focus on algorithmic processes of value-based decision-making (Box 1) are well-suited to identify the specific components of reinforcement learning and decision-making that characterize SUD. This is exciting as such mechanistic research can bridge the behavioral manifestations of SUD with underlying neurobiology, providing fertile ground for cross-species translation [12, 13, 14, 15, 16]. Computational theoretical models thus hold promise as tools to provide additional mechanistic insight into SUD diagnosis and prognosis, and to help guide personalized treatments based on the latent variables governing individual behavior.

Here, we review recent theory-driven computational psychiatry studies of SUD primarily conducted with human subjects, highlighting the ways in which these studies have extended and refined our understanding of value-based decision-making processes in addiction. We focus on two key objectives of this work: to identify deviations from health (via case-control comparisons), and to map specific SUD symptoms and clinically relevant states onto specific model variables — the latter aimed at moving closer to understanding the most defining yet most elusive aspect of the disorder: its dynamic, cyclical course. We conclude by outlining two directions for future research. We propose that a holistic approach that expands the typical parameter space examined within the same individual, and the duration of observation, may better serve these critical objectives and significantly enhance the clinical impact of computational psychiatry for addiction applications.

Section snippets

Deviation from health as indication of psychopathology: diagnostic differences between addicted and healthy individuals

SUD is a chronic, relapsing disorder characterized by repeated periods of drug craving, intoxication, bingeing, and withdrawal [17]. Drug use is maintained despite harmful consequences. The reinforcing and addictive effects of drugs center on the brain’s reward (or ‘valuation’ [18]) circuit. At the core of this circuit lie the dopaminergic pathways originating from the midbrain (ventral tegmental area and substantia nigra) and projecting onto the striatum and prefrontal cortex (orbitofrontal

Capturing addiction dynamics: using computational models to understand within-person variability, symptom expression, prognosis, and treatment

Addiction is not static, and indeed, it can be said that understanding addiction’s longitudinal course is to understand addiction itself. The ‘addiction cycle’ has been described as having three stages: preoccupation-anticipation, bingeing-intoxication, and withdrawal-negative affect [22,62, 63, 64]. These stages are likely associated with distinct value-based processes. Although no research to date has identified the algorithmic mechanisms that underlie the transition between each stage,

Conclusion and future directions

Computational psychiatry has garnered considerable attention in recent years but enthusiasm for its presumed clinical utility is rightly tempered [82]. Here, we review the promise of this approach for addiction applications. While computationally informed studies have produced novel explanatory insights about value-based processes in addiction that help to refine long-held theoretical accounts, we also identified two directions for future research that could significantly enhance the clinical

Conflict of interest statement

Nothing declared.

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

• of special interest
•• of outstanding interest

Acknowledgements

The authors acknowledge funding from the Brain and Behavior Research Foundation (BBRF NARSAD Grant #25387), Busch Biomedical Research Program, and NIH/NIDA (DA043676). Special thanks to the Addiction and Decision Neuroscience Laboratory members, Silvia Lopez-Guzman, and Paul W. Glimcher for helpful discussions.

References (82)

Q.J. Huys et al.
The role of learning-related dopamine signals in addiction vulnerability
Prog Brain Res
(2014)
W.K. Bickel et al.
21st century neurobehavioral theories of decision making in addiction: review and evaluation
Pharmacol Biochem Behav
(2018)
G. Schoenbaum et al.
Orbitofrontal cortex, decision-making and drug addiction
Trends Neurosci
(2006)
S.H. Ahmed
Individual decision-making in the causal pathway to addiction: contributions and limitations of rodent models
Pharmacol Biochem Behav
(2018)
P.R. Montague et al.
Computational psychiatry
Trends Cogn Sci
(2012)
X.J. Wang et al.
Computational psychiatry
Neuron
(2014)
S. Liu et al.
Translation of computational psychiatry in the context of addiction
JAMA Psychiatry
(2020)
B.M. Sweis et al.
Beyond simple tests of value: measuring addiction as a heterogeneous disease of computation-specific valuation processes
Learn Mem
(2018)
H.M. Bayer et al.
Midbrain dopamine neurons encode a quantitative reward prediction error signal
Neuron
(2005)
W. Schultz et al.
A neural substrate of prediction and reward
Science
(1997)

R.Z. Goldstein et al.

Dysfunction of the prefrontal cortex in addiction: neuroimaging findings and clinical implications

Nat Rev Neurosci

(2011)

L. Deserno et al.

Chronic alcohol intake abolishes the relationship between dopamine synthesis capacity and learning signals in the ventral striatum

Eur J Neurosci

(2015)

E.J. Rose et al.

Temporal difference error prediction signal dysregulation in cocaine dependence

Neuropsychopharmacology

(2014)

M. Amlung et al.

Steep delay discounting and addictive behavior: a meta-analysis of continuous associations

Addiction

(2017)

R.Z. Goldstein et al.

Drug addiction and its underlying neurobiological basis: neuroimaging evidence for the involvement of the frontal cortex

Am J Psychiatry

(2002)

T.E. Baker et al.

Smoking decisions: altered reinforcement learning signals induced by nicotine state

Nicotine Tob Res

(2020)

X. Gu et al.

A Bayesian observer model of drug craving

JAMA Psychiatry

(2017)

S.M. Groman et al.

Dysregulation of decision making related to metabotropic glutamate 5, but not midbrain D3, receptor availability following cocaine self-administration in rats

Biol Psychiatry

(2020)

M. Browning et al.

Realizing the clinical potential of computational psychiatry: report from the Banbury Center Meeting, February 2019

Biol Psychiatry

(2020)

A. Rangel et al.

A framework for studying the neurobiology of value-based decision making

Nat Rev Neurosci

(2008)

A.D. Redish et al.

A unified framework for addiction: vulnerabilities in the decision process

Behav Brain Sci

(2008)

A.D. Redish

Addiction as a computational process gone awry

Science

(2004)

Q.J. Huys et al.

Computational psychiatry as a bridge from neuroscience to clinical applications

Nat Neurosci

(2016)

T.V. Maia et al.

Theory-based computational psychiatry

Biol Psychiatry

(2017)

A. Belin-Rauscent et al.

How preclinical models evolved to resemble the diagnostic criteria of drug addiction

Biol Psychiatry

(2016)

S.M. Groman

The neurobiology of impulsive decision-making and reinforcement learning in nonhuman animals

Curr Top Behav Neurosci

(2020)

S.M. Groman

Investigating the computational underpinnings of addiction

Neuropsychopharmacology

(2019)

DSM5

Diagnostic and Statistical Manual of Mental Disorders

(2013)

O. Bartra et al.

The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value

Neuroimage

(2013)

P. Trifilieff et al.

Blunted dopamine release as a biomarker for vulnerability for substance use disorders

Biol Psychiatry

(2014)

N.D. Volkow et al.

Neurobiologic advances from the brain disease model of addiction

N Engl J Med

(2016)

M. Luijten et al.

Disruption of reward processing in addiction: an image-based meta-analysis of functional magnetic resonance imaging studies

JAMA Psychiatry

(2017)

A.B. Konova et al.

Role of the value circuit in addiction and addiction treatment

P.H. Chiu et al.

Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task

Nat Neurosci

(2008)

S.Q. Park et al.

Prefrontal cortex fails to learn from reward prediction errors in alcohol dependence

J Neurosci

(2010)

J. Tanabe et al.

Reduced neural tracking of prediction error in substance-dependent individuals

Am J Psychiatry

(2013)

V.B. Gradin et al.

Abnormal brain activity during a reward and loss task in opiate-dependent patients receiving methadone maintenance therapy

Neuropsychopharmacology

(2014)

C.E. Myers et al.

Probabilistic reward- and punishment-based learning in opioid addiction: experimental and computational data

Behav Brain Res

(2016)

J.W. Kanen et al.

Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant use disorder and obsessive-compulsive disorder: remediating effects of dopaminergic D2/3 receptor agents

Psychopharmacology (Berl)

(2019)

E.J. Rose et al.

Chronic exposure to nicotine is associated with reduced reward-related activity in the striatum but not the midbrain

Biol Psychiatry

(2012)

L.S. Morris et al.

Biases in the explore-exploit tradeoff in addictions: the role of avoidance of uncertainty

Neuropsychopharmacology

(2016)

Cited by (31)

Dysfunctional feedback processing in male methamphetamine abusers: Evidence from neurophysiological and computational approaches
2024, Neuropsychologia
Methamphetamine use disorder (MUD) as a major public health risk is associated with dysfunctional neural feedback processing. Although dysfunctional feedback processing in people who are substance dependent has been explored in several behavioral, computational, and electrocortical studies, this mechanism in MUDs requires to be well understood. Furthermore, the current understanding of latent components of their behavior such as learning speed and exploration-exploitation dilemma is still limited. In addition, the association between the latent cognitive components and the related neural mechanisms also needs to be explored. Therefore, in this study, the underlying neurocognitive mechanisms of feedback processing of such impairment, and age/gender-matched healthy controls are evaluated within a probabilistic learning task with rewards and punishments. Mathematical modeling results based on the Q-learning paradigm suggested that MUDs show less sensitivity in distinguishing optimal options. Additionally, it may be worth noting that MUDs exhibited a slight decrease in their ability to learn from negative feedback compared to healthy controls. Also through the lens of underlying neural mechanisms, MUDs showed lower theta power at the medial-frontal areas while responding to negative feedback. However, other EEG measures of reinforcement learning including feedback-related negativity, parietal-P300, and activity flow from the medial frontal to lateral prefrontal regions, remained intact in MUDs. On the other hand, the elimination of the linkage between value sensitivity and medial-frontal theta activity in MUDs was observed. The observed dysfunction could be due to the adverse effects of methamphetamine on the cortico-striatal dopamine circuit, which is reflected in the anterior cingulate cortex activity as the most likely region responsible for efficient behavior adjustment. These findings could help us to pave the way toward tailored therapeutic approaches.
The utility of a latent-cause framework for understanding addiction phenomena
2024, Addiction Neuroscience
Computational models of addiction often rely on a model-free reinforcement learning (RL) formulation, owing to the close associations between model-free RL, habitual behavior and the dopaminergic system. However, such formulations typically do not capture key recurrent features of addiction phenomena such as craving and relapse. Moreover, they cannot account for goal-directed aspects of addiction that necessitate contrasting, model-based formulations. Here we synthesize a growing body of evidence and propose that a latent-cause framework can help unify our understanding of several recurrent phenomena in addiction, by viewing them as the inferred return of previous, persistent “latent causes”. We demonstrate that applying this framework to Pavlovian and instrumental settings can help account for defining features of craving and relapse such as outcome-specificity, generalization, and cyclical dynamics. Finally, we argue that this framework can bridge model-free and model-based formulations, and account for individual variability in phenomenology by accommodating the memories, beliefs, and goals of those living with addiction, motivating a centering of the individual, subjective experience of addiction and recovery.
The role of reinforcement learning in shaping the decision policy in methamphetamine use disorders
2024, Journal of Choice Modelling
The prevalence of methamphetamine use disorder (MUD) as a major public health problem has increased dramatically over the last two decades, reaching epidemic levels, which pose high costs to the health care systems worldwide and is commonly associated with experience-based decision-making (EDM) aberrant. However, precise mechanisms underlying such non-optimally in choice patterns still remain poorly understood. In this study, to uncover the latent neurobiological and psychological meaningful processes of such impairment, we apply a reinforcement learning diffusion decision model (RL-DDM) while methamphetamine abuser participants ( $n = 18$ , all men; mean (±SD) age: 27.3±5) and age/sex-matched healthy controls ( $n = 25$ , all men; mean (±SD) age: 26.8.0±3.63) perform choices to resolve uncertainty within a simple probabilistic learning task with rewards and punishments. Preliminary behavior results indicated that addicts made maladaptive patterns of learning that mirrored in both choices and response times (RTs). Furthermore, modeling results revealed that such EDM impairment (maladaptive pattern in optimal selection) in addicts was more imputable to both increased learning rates (more sensitive to outcome fluctuations) and decreased drift rate (less reward sensitivity) compared to healthy. In addition, addicts also showed substantially longer non-decision times (attributed to slower RTs), as well as lower decision boundary criteria (reflection of impulsive choice). Taken together, our findings reveal precise mechanisms associated with EDM impairments in methamphetamine use disorder and confirm the debility of the options values assignment system as the main hub in learning-based decision making.
Recent Opioid Use Impedes Range Adaptation in Reinforcement Learning in Human Addiction
2024, Biological Psychiatry
Drugs like opioids are potent reinforcers thought to co-opt value-based decisions by overshadowing other rewarding outcomes, but how this happens at a neurocomputational level remains elusive. Range adaptation is a canonical process of fine-tuning representations of value based on reward context. Here, we tested whether recent opioid exposure impacts range adaptation in opioid use disorder, potentially explaining why shifting decision making away from drug taking during this vulnerable period is so difficult.
Participants who had recently (<90 days) used opioids (n = 34) or who had abstained from opioid use for ≥ 90 days (n = 20) and comparison control participants (n = 44) completed a reinforcement learning task designed to induce robust contextual modulation of value. Two models were used to assess the latent process that participants engaged while making their decisions: 1) a Range model that dynamically tracks context and 2) a standard Absolute model that assumes stationary, objective encoding of value.
Control participants and ≥90-days-abstinent participants with opioid use disorder exhibited choice patterns consistent with range-adapted valuation. In contrast, participants with recent opioid use were more prone to learn and encode value on an absolute scale. Computational modeling confirmed the behavior of most control participants and ≥90-days-abstinent participants with opioid use disorder (75%), but a minority in the recent use group (38%), was better fit by the Range model than the Absolute model. Furthermore, the degree to which participants relied on range adaptation correlated with duration of continuous abstinence and subjective craving/withdrawal.
Reduced context adaptation to available rewards could explain difficulty deciding about smaller (typically nondrug) rewards in the aftermath of drug exposure.
Reduced neural encoding of utility prediction errors in cocaine addiction
2023, Neuron
Influential accounts of addiction posit alterations in adaptive behavior driven by deficient dopaminergic prediction errors (PEs), signaling the discrepancy between actual and expected reward. Dopamine neurons encode these error signals in subjective terms, calibrated by individual risk preferences, as “utility” PEs. It remains unclear, however, whether people with drug addiction have PE deficits or their computational source. Here, using an analogous task to prior single-unit studies with known expectancies, we show that fMRI-measured PEs similarly reflect utility PEs. Relative to control participants, people with chronic cocaine addiction demonstrate reduced utility PEs in the dopaminoceptive ventral striatum, with similar trends in orbitofrontal cortex. Dissecting this PE signal into its subcomponent terms attributed these reductions to weaker striatal responses to received reward/utility, whereas suppression of activity with reward expectation was unchanged. These findings support that addiction may fundamentally disrupt PE signaling and reveal an underappreciated role for perceived reward value in this mechanism.
Adaptive Design Optimization as a Promising Tool for Reliable and Efficient Computational Fingerprinting
2023, Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
A key challenge in understanding mental (dys)functions is their etiological and functional heterogeneity, and several multidimensional assessments have been proposed for their comprehensive characterization. However, such assessments require lengthy testing, which may hinder reliable and efficient characterization of individual differences due to increased fatigue and distraction, especially in clinical populations. Computational modeling may address this challenge as it often provides more reliable measures of latent neurocognitive processes underlying observed behaviors and captures individual differences better than traditional assessments. However, even with a state-of-the-art hierarchical modeling approach, reliable estimation of model parameters still requires a large number of trials. Recent work suggests that Bayesian adaptive design optimization (ADO) is a promising way to address these challenges. With ADO, experimental design is optimized adaptively from trial to trial to extract the maximum amount of information about an individual’s characteristics. In this review, we first describe the ADO methodology and then summarize recent work demonstrating that ADO increases the reliability and efficiency of latent neurocognitive measures. We conclude by discussing the challenges and future directions of ADO and proposing development of ADO-based computational fingerprints to reliably and efficiently characterize the heterogeneous profiles of psychiatric disorders.

View all citing articles on Scopus

View full text

Computational theory-driven studies of reinforcement learning and decision-making in addiction: what have we learned?

Highlights

Introduction

Section snippets

Deviation from health as indication of psychopathology: diagnostic differences between addicted and healthy individuals

Capturing addiction dynamics: using computational models to understand within-person variability, symptom expression, prognosis, and treatment

Conclusion and future directions

Conflict of interest statement

References and recommended reading

Acknowledgements

Prog Brain Res

Pharmacol Biochem Behav

Trends Neurosci

Pharmacol Biochem Behav

Trends Cogn Sci

Neuron

JAMA Psychiatry

Learn Mem

Neuron

Science

Nat Rev Neurosci

Eur J Neurosci

Neuropsychopharmacology

Addiction

Am J Psychiatry

Nicotine Tob Res

JAMA Psychiatry

Biol Psychiatry

Biol Psychiatry

A framework for studying the neurobiology of value-based decision making

Nat Rev Neurosci

A unified framework for addiction: vulnerabilities in the decision process

Behav Brain Sci

Addiction as a computational process gone awry

Science

Computational psychiatry as a bridge from neuroscience to clinical applications

Nat Neurosci

Theory-based computational psychiatry

Biol Psychiatry

How preclinical models evolved to resemble the diagnostic criteria of drug addiction

Biol Psychiatry

The neurobiology of impulsive decision-making and reinforcement learning in nonhuman animals

Curr Top Behav Neurosci

Investigating the computational underpinnings of addiction

Neuropsychopharmacology

Diagnostic and Statistical Manual of Mental Disorders

The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value

Neuroimage

Blunted dopamine release as a biomarker for vulnerability for substance use disorders

Biol Psychiatry

Neurobiologic advances from the brain disease model of addiction

N Engl J Med

Disruption of reward processing in addiction: an image-based meta-analysis of functional magnetic resonance imaging studies

JAMA Psychiatry

Role of the value circuit in addiction and addiction treatment

Smokers’ brains compute, but ignore, a fictive error signal in a sequential investment task

Nat Neurosci

Prefrontal cortex fails to learn from reward prediction errors in alcohol dependence

J Neurosci

Reduced neural tracking of prediction error in substance-dependent individuals

Am J Psychiatry

Abnormal brain activity during a reward and loss task in opiate-dependent patients receiving methadone maintenance therapy

Neuropsychopharmacology

Probabilistic reward- and punishment-based learning in opioid addiction: experimental and computational data

Behav Brain Res

Computational modelling reveals contrasting effects on reinforcement learning and cognitive flexibility in stimulant use disorder and obsessive-compulsive disorder: remediating effects of dopaminergic D2/3 receptor agents

Psychopharmacology (Berl)

Chronic exposure to nicotine is associated with reduced reward-related activity in the striatum but not the midbrain

Biol Psychiatry

Biases in the explore-exploit tradeoff in addictions: the role of avoidance of uncertainty

Neuropsychopharmacology