Impulsivity and Compulsivity in Bayesian Reinforcement Learning Models of Addiction: A Computational Critique of the Habit Theory

Kinley, Isaac; Becker, Suzanna

doi:10.1007/978-3-031-55889-4_13

Isaac Kinley² &
Suzanna Becker²

128 Accesses

Abstract

Addiction is sometimes argued to represent an extreme dominance of habitual behaviour, driven by stimulus–response associations, over goal-directed behaviour, involving planning based on action–outcome contingencies. In this chapter, we formalize a recent elaboration on this “habit theory” of addiction using Bayesian reinforcement algorithms as models of habit and planning. In these models, compulsivity and intertemporal impatience, both considered important elements of addiction, can arise through a dominance of habit over planning, but only on the assumption that the planning system does not overvalue addictive rewards. That is, the habit theory of addiction implicitly assumes that the planning system, in contrast to the habit system, ascribes appropriately high values to non-addictive rewards and appropriately low values to addictive rewards. However, recent evidence suggests that goal-directed overvaluation of addictive rewards is a key driver of addiction, which presents a significant challenge for the habit theory. We discuss whether this challenge will prove to be insurmountable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
This minimal value is \(\sigma ^2_r + \epsilon \), where \(\epsilon \) depends on the forgetting rate w such that \(w = 1\) implies \(\epsilon = 0\).
2.
On this view, dopamine transmission can be taken to encode \(1/\sigma ^{2}_r\) rather than \(\delta \) in the current learning equations. Thus, with increased dopamine transmission, \(\sigma ^2_r\) decreases and \(\alpha \), the “learning rate”, increases.
3.
It is perhaps surprising that \(\mathrm {Var}[R_s]\) depends only on the relative values of \(\theta _{s'}\), i.e., that multiplying \(\boldsymbol {\theta }\) by some scalar \(k > 1\) does not decrease \(\mathrm {Var}[R_s]\). This is because the variance is a linear function of the mixture weights; the variances of the beta-binomial and Dirichlet-multinomial distribution are similarly dependent only on the relative values of their concentration parameters when the number of trials is 1.

References

Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B, 34(2b), 77–98.
Article Google Scholar
Amlung, M., Vedelago, L., Acker, J., Balodis, I., & MacKillop, J. (2017). Steep delay discounting and addictive behavior: A meta-analysis of continuous associations. Addiction, 112(1), 51–62.
Article PubMed Google Scholar
Atance, C. M., & O’Neill, D. K. (2001). Episodic future thinking. Trends in Cognitive Sciences, 5(12), 533–539.
Article PubMed Google Scholar
Audrain-McGovern, J., Rodriguez, D., Epstein, L. H., Cuevas, J., Rodgers, K., & Wileyto, E. P. (2009). Does delay discounting play an etiological role in smoking or is it a consequence of smoking? Drug and Alcohol Dependence, 103(3), 99–106.
Article PubMed PubMed Central Google Scholar
Bickel, W. K., Athamneh, L. N., Basso, J. C., Mellis, A. M., DeHart, W. B., Craft, W. H., & Pope, D. (2019). Excessive discounting of delayed reinforcers as a trans-disease process: Update on the state of the science. Current Opinion in Psychology, 30, 59–64.
Article PubMed PubMed Central Google Scholar
Bickel, W. K., Koffarnus, M. N., Moody, L., & Wilson, A. G. (2014). The behavioral-and neuro-economic process of temporal discounting: A candidate behavioral marker of addiction. Neuropharmacology, 76, 518–527.
Article CAS PubMed Google Scholar
Boileau, I., Payer, D., Chugani, B., Lobo, D., Houle, S., Wilson, A., Warsh, J., Kish, S., & Zack, M. (2014). In vivo evidence for greater amphetamine-induced dopamine release in pathological gambling: A positron emission tomography study with [11C]-(+)-PHNO. Molecular Psychiatry, 19(12), 1305–1313.
Article CAS PubMed Google Scholar
Bruner, N. R., & Johnson, M. W. (2014). Demand curves for hypothetical cocaine in cocaine-dependent individuals. Psychopharmacology, 231, 889–897.
Article CAS PubMed Google Scholar
Coelho, L. P. (2013, April). Integral of the product of two gaussians. Retrieved from https://luispedro.org/files/derivations/gaussian_integral.pdf
Google Scholar
Cuzen, N. L., & Stein, D. J. (2014). Behavioral addiction: The nexus of impulsivity and compulsivity. In Behavioral addictions (pp. 19–34). Elsevier.
Google Scholar
D’Argembeau, A. (2013). On the role of the ventromedial prefrontal cortex in self-processing: The valuation hypothesis. Frontiers in Human Neuroscience, 7, 372.
Article PubMed PubMed Central Google Scholar
Daugherty, J. R., & Brase, G. L. (2010). Taking time to be healthy: Predicting health behaviors with delay discounting and time perspective. Personality and Individual Differences, 48(2), 202–207.
Article Google Scholar
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215.
Article CAS PubMed PubMed Central Google Scholar
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.
Article CAS PubMed Google Scholar
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 150–159)
Google Scholar
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. AAAI/IAAI, 1998, 761–768.
Google Scholar
Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308(1135), 67–78.
Article Google Scholar
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325.
Article CAS PubMed PubMed Central Google Scholar
Eppinger, B., Walter, M., Heekeren, H. R., & Li, S.-C. (2013). Of goals and habits: Age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 7, 253.
Article PubMed PubMed Central Google Scholar
Epstein, D. H. (2020). Let’s agree to agree: A comment on Hogarth (2020), with a plea for not-so-competing theories of addiction. Neuropsychopharmacology, 45(5), 715–716.
Article PubMed PubMed Central Google Scholar
Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nature Neuroscience, 8(11), 1481–1489.
Article CAS PubMed Google Scholar
Everitt, B. J., & Robbins, T. W. (2016). Drug addiction: Updating actions to habits to compulsions ten years on. Annual Review of Psychology, 67, 23–50.
Article PubMed Google Scholar
Forster, S. E., Steinhauer, S. R., Ortiz, A., & Forman, S. D. (2021). Evaluating effects of episodic future thinking on valuation of delayed reward in cocaine use disorder: A pilot study. The American Journal of Drug and Alcohol Abuse, 47(2), 199–208.
Article PubMed PubMed Central Google Scholar
Friston, K. (2012). Policies and priors. B. Gutkin, & S. H. Ahmed (Eds.) Computational neuroscience of drug addiction (pp. 237–283). Springer.
Google Scholar
Friston, K. J., Shiner, T., FitzGerald, T., Galea, J. M., Adams, R., Brown, H., Dolan, R. J., Moran, R., Stephan, K. E., & Bestmann, S. (2012). Dopamine, affordance and active inference. PLoS Computational Biology, 8(1), e1002327.
Article CAS PubMed PubMed Central Google Scholar
García-Pérez, Á., Aonso-Diego, G., Weidberg, S., & Secades-Villa, R. (2022). Effects of episodic future thinking on reinforcement pathology during smoking cessation treatment among individuals with substance use disorders. Psychopharmacology, 239(2), 631–642.
Article PubMed PubMed Central Google Scholar
Gershman, S. J., & Bhui, R. (2020). Rationally inattentive intertemporal choice. Nature Communications, 11(1), 3365.
Article CAS PubMed PubMed Central Google Scholar
Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A., & Daw, N. D. (2016a). Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife, 5, e11305.
Article PubMed PubMed Central Google Scholar
Gillan, C. M., Robbins, T. W., Sahakian, B. J., van den Heuvel, O. A., & van Wingen, G. (2016b). The role of habit in compulsivity. European Neuropsychopharmacology, 26(5), 828–840.
Article CAS PubMed PubMed Central Google Scholar
Guillem, K., & Ahmed, S. H. (2018). Preference for cocaine is represented in the orbitofrontal cortex by an increased proportion of cocaine use-coding neurons. Cerebral Cortex, 28(3), 819–832.
Article PubMed Google Scholar
Guillem, K., Brenot, V., Durand, A., & Ahmed, S. H. (2018). Neuronal representation of individual heroin choices in the orbitofrontal cortex. Addiction Biology, 23(3), 880–888.
Article PubMed Google Scholar
Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks, 19(8), 1242–1254.
Article PubMed Google Scholar
Hassabis, D., Kumaran, D., Vann, S. D., & Maguire, E. A. (2007). Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences, 104(5), 1726–1731.
Article CAS Google Scholar
Hogarth, L. (2020). Addiction is driven by excessive goal-directed drug choice under negative affect: Translational critique of habit and compulsion theory. Neuropsychopharmacology, 45(5), 720–735.
Article PubMed PubMed Central Google Scholar
Huang, Y., Yaple, Z. A., & Yu, R. (2020). Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning. NeuroImage, 215, 116834.
Article PubMed Google Scholar
Hunter, L. E., Bornstein, A. M., & Hartley, C. A. (2018). A common deliberative process underlies model-based planning and patient intertemporal choice. bioRxiv, 499707.
Google Scholar
Hutcheson, D., Everitt, B., Robbins, T., & Dickinson, A. (2001). The role of withdrawal in heroin addiction: Enhances reward or promotes avoidance? Nature Neuroscience, 4(9), 943–947.
Article CAS PubMed Google Scholar
Jaynes, E. T. (1957a). Information theory and statistical mechanics. Physical Review, 106(4), 620.
Article Google Scholar
Jaynes, E. T. (1957b). Information theory and statistical mechanics. II. Physical Review, 108(2), 171.
Google Scholar
Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience, 27(45), 12176–12189.
Article CAS PubMed Google Scholar
Keramati, M., & Gutkin, B. (2013). Imbalanced decision hierarchy in addicts emerging from drug-hijacked dopamine spiraling circuit. PloS One, 8(4), e61489.
Article CAS PubMed PubMed Central Google Scholar
Kinley, I., Amlung, M., & Becker, S. (2022). Pathologies of precision: A Bayesian account of goals, habits, and episodic foresight in addiction. Brain and Cognition, 158, 105843.
Article PubMed Google Scholar
Krieckhaus, E., & Wolf, G. (1968). Acquisition of sodium by rats: Interaction of innate mechanisms and latent learning. Journal of Comparative and Physiological Psychology, 65(2), 197.
Article CAS PubMed Google Scholar
Lee, R. S., Hoppenbrouwers, S., & Franken, I. (2019). A systematic meta-review of impulsivity and compulsivity in addictive behaviors. Neuropsychology Review, 29, 14–26.
Article PubMed Google Scholar
Lee, S. W., Shimojo, S., & O’Doherty, J. P. (2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81(3), 687–699.
Article CAS PubMed PubMed Central Google Scholar
MacKillop, J., Amlung, M. T., Few, L. R., Ray, L. A., Sweet, L. H., & Munafò, M. R. (2011). Delayed reward discounting and addictive behavior: A meta-analysis. Psychopharmacology, 216, 305–321.
Article CAS PubMed PubMed Central Google Scholar
Mantsch, J. R., Baker, D. A., Funk, D., Lê, A. D., & Shaham, Y. (2016). Stress-induced reinstatement of drug seeking: 20 years of progress. Neuropsychopharmacology, 41(1), 335–356.
Article CAS PubMed Google Scholar
Mathar, D., Erfanian Abdoust, M., Marrenbach, T., Tuzsus, D., & Peters, J. (2022). The catecholamine precursor tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLOS Computational Biology, 18(12), e1010785.
Article CAS PubMed PubMed Central Google Scholar
Matochik, J. A., London, E. D., Eldreth, D. A., Cadet, J.-L., & Bolla, K. I. (2003). Frontal cortical tissue composition in abstinent cocaine abusers: A magnetic resonance imaging study. Neuroimage, 19(3), 1095–1102.
Article PubMed Google Scholar
Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. Quantitative Analyses of Behavior, 5, 55–73.
Google Scholar
Mollick, J. A., & Kober, H. (2020). Computational models of drug use and addiction: A review. Journal of Abnormal Psychology, 129(6), 544.
Article PubMed PubMed Central Google Scholar
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103–130.
Article Google Scholar
Naik, A., Shariff, R., Yasui, N., Yao, H., & Sutton, R. S. (2019). Discounted reinforcement learning is not an optimization problem. Preprint. arXiv:1910.02140.
Google Scholar
Ognibene, D., Fiore, V. G., & Gu, X. (2019). Addiction beyond pharmacological effects: The role of environment complexity and bounded rationality. Neural Networks, 116, 269–278.
Article PubMed PubMed Central Google Scholar
Patel, H., & Amlung, M. (2020). Acute and extended exposure to episodic future thinking in a treatment seeking addiction sample: A pilot study. Journal of Substance Abuse Treatment, 116, 108046.
Article PubMed Google Scholar
Pierce, R. C., & Kumaresan, V. (2006). The mesolimbic dopamine system: The final common pathway for the reinforcing effect of drugs of abuse? Neuroscience & Biobehavioral Reviews, 30(2), 215–238.
Article CAS Google Scholar
Poletti, M., Logi, C., Lucetti, C., Del Dotto, P., Baldacci, F., Vergallo, A., Ulivi, M., Del Sarto, S., Rossi, G., Ceravolo, R., et al. (2013). A single-center, cross-sectional prevalence study of impulse control disorders in Parkinson disease: Association with dopaminergic drugs. Journal of Clinical Psychopharmacology, 33(5), 691–694.
Article CAS PubMed Google Scholar
Radenbach, C., Reiter, A. M., Engert, V., Sjoerds, Z., Villringer, A., Heinze, H.-J., Deserno, L., & Schlagenhauf, F. (2015). The interaction of acute and chronic stress impairs model-based behavioral control. Psychoneuroendocrinology, 53, 268–280.
Article PubMed Google Scholar
Redish, A. D. (2004). Addiction as a computational process gone awry. Science, 306(5703), 1944–1947.
Article CAS PubMed Google Scholar
Redish, A. D., Jensen, S., & Johnson, A. (2008). Addiction as vulnerabilities in the decision process. Behavioral and Brain Sciences, 31(4), 461–487.
Article Google Scholar
Rösch, S. A., Stramaccia, D. F., & Benoit, R. G. (2022). Promoting farsighted decisions via episodic future thinking: A meta-analysis. Journal of Experimental Psychology: General, 151(7), 1606.
Article PubMed Google Scholar
Rozeboom, W. W. (1958). “What is learned?”—An empirical enigma. Psychological Review, 65(1), 22.
Article CAS PubMed Google Scholar
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.
Article CAS PubMed Google Scholar
Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13(3), 900–913.
Article CAS PubMed Google Scholar
Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Wurst, F., Kronbichler, M., & Friston, K. (2015). Optimal inference with suboptimal models: Addiction and active Bayesian inference. Medical Hypotheses, 84(2), 109–117.
Article PubMed PubMed Central Google Scholar
Shenhav, A., Rand, D. G., & Greene, J. D. (2017). The relationship between intertemporal choice and following the path of least resistance across choices, preferences, and beliefs. Judgment and Decision Making, 12(1), 1–18.
Article Google Scholar
Sinclair, H., Lochner, C., & Stein, D. J. (2016). Behavioural addiction: A useful construct? Current Behavioral Neuroscience Reports, 3, 43–48.
Article Google Scholar
Snider, S. E., LaConte, S. M., & Bickel, W. K. (2016). Episodic future thinking: Expansion of the temporal window in individuals with alcohol dependence. Alcoholism: Clinical and Experimental Research, 40(7), 1558–1566.
Article PubMed Google Scholar
Solway, A., Lohrenz, T., & Montague, P. R. (2017). Simulating future value in intertemporal choice. Scientific Reports, 7(1), 43119.
Article CAS PubMed PubMed Central Google Scholar
Sozou, P. D. (1998). On hyperbolic discounting and uncertain hazard rates. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1409), 2015–2020.
Article PubMed Central Google Scholar
Story, G. W., Vlaev, I., Seymour, B., Darzi, A., & Dolan, R. J. (2014). Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective. Frontiers in Behavioral Neuroscience, 8, 76.
Article PubMed PubMed Central Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Google Scholar
Szpunar, K. K., & Schacter, D. L. (2013). Get real: Effects of repeated simulation and emotion on the perceived plausibility of future experiences. Journal of Experimental Psychology: General, 142(2), 323.
Article PubMed Google Scholar
van Rooij, I., & Blokpoel, M. (2020). Formalizing verbal theories: A tutorial by dialogue (preprint). psyarxiv.
Google Scholar
Vikbladh, O. M., Meager, M. R., King, J., Blackmon, K., Devinsky, O., Shohamy, D., Burgess, N., & Daw, N. D. (2019). Hippocampal contributions to model-based planning and spatial memory. Neuron, 102(3), 683–693.
Article CAS PubMed PubMed Central Google Scholar
Voon, V., Derbyshire, K., Rück, C., Irvine, M. A., Worbe, Y., Enander, J., Schreiber, L. R., Gillan, C., Fineberg, N. A., Sahakian, B. J., et al. (2015). Disorders of compulsivity: A common bias towards learning habits. Molecular Psychiatry, 20(3), 345–352.
Article CAS PubMed Google Scholar
Wagner, B., Mathar, D., & Peters, J. (2022). Gambling environment exposure increases temporal discounting but improves model-based control in regular slot-machine gamblers. Computational Psychiatry, 6(1), 142–165. Ubiquity Press.
Google Scholar
Wang, X., Li, B., Zhou, X., Liao, Y., Tang, J., Liu, T., Hu, D., & Hao, W. (2012). Changes in brain gray matter in abstinent heroin addicts. Drug and Alcohol Dependence, 126(3), 304–308.
Article CAS PubMed Google Scholar
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Article Google Scholar
Yaari, M. E. (1965). Uncertain lifetime, life insurance, and the theory of the consumer. The Review of Economic Studies, 32(2), 137–150.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Neuroscience & Behaviour, McMaster University, Hamilton, ON, Canada
Isaac Kinley & Suzanna Becker

Authors

Isaac Kinley
View author publications
You can also search for this author in PubMed Google Scholar
Suzanna Becker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isaac Kinley .

Editor information

Editors and Affiliations

Laboratoire de Neurosciences Expérimentales et Cliniques, Université de Poitiers, INSERM, U-1084, Poitiers, France
Youna Vandaele

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kinley, I., Becker, S. (2024). Impulsivity and Compulsivity in Bayesian Reinforcement Learning Models of Addiction: A Computational Critique of the Habit Theory. In: Vandaele, Y. (eds) Habits. Springer, Cham. https://doi.org/10.1007/978-3-031-55889-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-55889-4_13
Published: 23 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55888-7
Online ISBN: 978-3-031-55889-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics