Skip to main content

Impulsivity and Compulsivity in Bayesian Reinforcement Learning Models of Addiction: A Computational Critique of the Habit Theory

  • Chapter
  • First Online:
Habits
  • 128 Accesses

Abstract

Addiction is sometimes argued to represent an extreme dominance of habitual behaviour, driven by stimulus–response associations, over goal-directed behaviour, involving planning based on action–outcome contingencies. In this chapter, we formalize a recent elaboration on this “habit theory” of addiction using Bayesian reinforcement algorithms as models of habit and planning. In these models, compulsivity and intertemporal impatience, both considered important elements of addiction, can arise through a dominance of habit over planning, but only on the assumption that the planning system does not overvalue addictive rewards. That is, the habit theory of addiction implicitly assumes that the planning system, in contrast to the habit system, ascribes appropriately high values to non-addictive rewards and appropriately low values to addictive rewards. However, recent evidence suggests that goal-directed overvaluation of addictive rewards is a key driver of addiction, which presents a significant challenge for the habit theory. We discuss whether this challenge will prove to be insurmountable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This minimal value is \(\sigma ^2_r + \epsilon \), where \(\epsilon \) depends on the forgetting rate w such that \(w = 1\) implies \(\epsilon = 0\).

  2. 2.

    On this view, dopamine transmission can be taken to encode \(1/\sigma ^{2}_r\) rather than \(\delta \) in the current learning equations. Thus, with increased dopamine transmission, \(\sigma ^2_r\) decreases and \(\alpha \), the “learning rate”, increases.

  3. 3.

    It is perhaps surprising that \(\mathrm {Var}[R_s]\) depends only on the relative values of \(\theta _{s'}\), i.e., that multiplying \(\boldsymbol {\theta }\) by some scalar \(k > 1\) does not decrease \(\mathrm {Var}[R_s]\). This is because the variance is a linear function of the mixture weights; the variances of the beta-binomial and Dirichlet-multinomial distribution are similarly dependent only on the relative values of their concentration parameters when the number of trials is 1.

References

  • Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B, 34(2b), 77–98.

    Article  Google Scholar 

  • Amlung, M., Vedelago, L., Acker, J., Balodis, I., & MacKillop, J. (2017). Steep delay discounting and addictive behavior: A meta-analysis of continuous associations. Addiction, 112(1), 51–62.

    Article  PubMed  Google Scholar 

  • Atance, C. M., & O’Neill, D. K. (2001). Episodic future thinking. Trends in Cognitive Sciences, 5(12), 533–539.

    Article  PubMed  Google Scholar 

  • Audrain-McGovern, J., Rodriguez, D., Epstein, L. H., Cuevas, J., Rodgers, K., & Wileyto, E. P. (2009). Does delay discounting play an etiological role in smoking or is it a consequence of smoking? Drug and Alcohol Dependence, 103(3), 99–106.

    Article  PubMed  PubMed Central  Google Scholar 

  • Bickel, W. K., Athamneh, L. N., Basso, J. C., Mellis, A. M., DeHart, W. B., Craft, W. H., & Pope, D. (2019). Excessive discounting of delayed reinforcers as a trans-disease process: Update on the state of the science. Current Opinion in Psychology, 30, 59–64.

    Article  PubMed  PubMed Central  Google Scholar 

  • Bickel, W. K., Koffarnus, M. N., Moody, L., & Wilson, A. G. (2014). The behavioral-and neuro-economic process of temporal discounting: A candidate behavioral marker of addiction. Neuropharmacology, 76, 518–527.

    Article  CAS  PubMed  Google Scholar 

  • Boileau, I., Payer, D., Chugani, B., Lobo, D., Houle, S., Wilson, A., Warsh, J., Kish, S., & Zack, M. (2014). In vivo evidence for greater amphetamine-induced dopamine release in pathological gambling: A positron emission tomography study with [11C]-(+)-PHNO. Molecular Psychiatry, 19(12), 1305–1313.

    Article  CAS  PubMed  Google Scholar 

  • Bruner, N. R., & Johnson, M. W. (2014). Demand curves for hypothetical cocaine in cocaine-dependent individuals. Psychopharmacology, 231, 889–897.

    Article  CAS  PubMed  Google Scholar 

  • Coelho, L. P. (2013, April). Integral of the product of two gaussians. Retrieved from https://luispedro.org/files/derivations/gaussian_integral.pdf

    Google Scholar 

  • Cuzen, N. L., & Stein, D. J. (2014). Behavioral addiction: The nexus of impulsivity and compulsivity. In Behavioral addictions (pp. 19–34). Elsevier.

    Google Scholar 

  • D’Argembeau, A. (2013). On the role of the ventromedial prefrontal cortex in self-processing: The valuation hypothesis. Frontiers in Human Neuroscience, 7, 372.

    Article  PubMed  PubMed Central  Google Scholar 

  • Daugherty, J. R., & Brase, G. L. (2010). Taking time to be healthy: Predicting health behaviors with delay discounting and time perspective. Personality and Individual Differences, 48(2), 202–207.

    Article  Google Scholar 

  • Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.

    Article  CAS  PubMed  Google Scholar 

  • Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 150–159)

    Google Scholar 

  • Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. AAAI/IAAI, 1998, 761–768.

    Google Scholar 

  • Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308(1135), 67–78.

    Article  Google Scholar 

  • Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Eppinger, B., Walter, M., Heekeren, H. R., & Li, S.-C. (2013). Of goals and habits: Age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 7, 253.

    Article  PubMed  PubMed Central  Google Scholar 

  • Epstein, D. H. (2020). Let’s agree to agree: A comment on Hogarth (2020), with a plea for not-so-competing theories of addiction. Neuropsychopharmacology, 45(5), 715–716.

    Article  PubMed  PubMed Central  Google Scholar 

  • Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nature Neuroscience, 8(11), 1481–1489.

    Article  CAS  PubMed  Google Scholar 

  • Everitt, B. J., & Robbins, T. W. (2016). Drug addiction: Updating actions to habits to compulsions ten years on. Annual Review of Psychology, 67, 23–50.

    Article  PubMed  Google Scholar 

  • Forster, S. E., Steinhauer, S. R., Ortiz, A., & Forman, S. D. (2021). Evaluating effects of episodic future thinking on valuation of delayed reward in cocaine use disorder: A pilot study. The American Journal of Drug and Alcohol Abuse, 47(2), 199–208.

    Article  PubMed  PubMed Central  Google Scholar 

  • Friston, K. (2012). Policies and priors. B. Gutkin, & S. H. Ahmed (Eds.) Computational neuroscience of drug addiction (pp. 237–283). Springer.

    Google Scholar 

  • Friston, K. J., Shiner, T., FitzGerald, T., Galea, J. M., Adams, R., Brown, H., Dolan, R. J., Moran, R., Stephan, K. E., & Bestmann, S. (2012). Dopamine, affordance and active inference. PLoS Computational Biology, 8(1), e1002327.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • García-Pérez, Á., Aonso-Diego, G., Weidberg, S., & Secades-Villa, R. (2022). Effects of episodic future thinking on reinforcement pathology during smoking cessation treatment among individuals with substance use disorders. Psychopharmacology, 239(2), 631–642.

    Article  PubMed  PubMed Central  Google Scholar 

  • Gershman, S. J., & Bhui, R. (2020). Rationally inattentive intertemporal choice. Nature Communications, 11(1), 3365.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A., & Daw, N. D. (2016a). Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife, 5, e11305.

    Article  PubMed  PubMed Central  Google Scholar 

  • Gillan, C. M., Robbins, T. W., Sahakian, B. J., van den Heuvel, O. A., & van Wingen, G. (2016b). The role of habit in compulsivity. European Neuropsychopharmacology, 26(5), 828–840.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Guillem, K., & Ahmed, S. H. (2018). Preference for cocaine is represented in the orbitofrontal cortex by an increased proportion of cocaine use-coding neurons. Cerebral Cortex, 28(3), 819–832.

    Article  PubMed  Google Scholar 

  • Guillem, K., Brenot, V., Durand, A., & Ahmed, S. H. (2018). Neuronal representation of individual heroin choices in the orbitofrontal cortex. Addiction Biology, 23(3), 880–888.

    Article  PubMed  Google Scholar 

  • Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks, 19(8), 1242–1254.

    Article  PubMed  Google Scholar 

  • Hassabis, D., Kumaran, D., Vann, S. D., & Maguire, E. A. (2007). Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences, 104(5), 1726–1731.

    Article  CAS  Google Scholar 

  • Hogarth, L. (2020). Addiction is driven by excessive goal-directed drug choice under negative affect: Translational critique of habit and compulsion theory. Neuropsychopharmacology, 45(5), 720–735.

    Article  PubMed  PubMed Central  Google Scholar 

  • Huang, Y., Yaple, Z. A., & Yu, R. (2020). Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning. NeuroImage, 215, 116834.

    Article  PubMed  Google Scholar 

  • Hunter, L. E., Bornstein, A. M., & Hartley, C. A. (2018). A common deliberative process underlies model-based planning and patient intertemporal choice. bioRxiv, 499707.

    Google Scholar 

  • Hutcheson, D., Everitt, B., Robbins, T., & Dickinson, A. (2001). The role of withdrawal in heroin addiction: Enhances reward or promotes avoidance? Nature Neuroscience, 4(9), 943–947.

    Article  CAS  PubMed  Google Scholar 

  • Jaynes, E. T. (1957a). Information theory and statistical mechanics. Physical Review, 106(4), 620.

    Article  Google Scholar 

  • Jaynes, E. T. (1957b). Information theory and statistical mechanics. II. Physical Review, 108(2), 171.

    Google Scholar 

  • Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience, 27(45), 12176–12189.

    Article  CAS  PubMed  Google Scholar 

  • Keramati, M., & Gutkin, B. (2013). Imbalanced decision hierarchy in addicts emerging from drug-hijacked dopamine spiraling circuit. PloS One, 8(4), e61489.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kinley, I., Amlung, M., & Becker, S. (2022). Pathologies of precision: A Bayesian account of goals, habits, and episodic foresight in addiction. Brain and Cognition, 158, 105843.

    Article  PubMed  Google Scholar 

  • Krieckhaus, E., & Wolf, G. (1968). Acquisition of sodium by rats: Interaction of innate mechanisms and latent learning. Journal of Comparative and Physiological Psychology, 65(2), 197.

    Article  CAS  PubMed  Google Scholar 

  • Lee, R. S., Hoppenbrouwers, S., & Franken, I. (2019). A systematic meta-review of impulsivity and compulsivity in addictive behaviors. Neuropsychology Review, 29, 14–26.

    Article  PubMed  Google Scholar 

  • Lee, S. W., Shimojo, S., & O’Doherty, J. P. (2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81(3), 687–699.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • MacKillop, J., Amlung, M. T., Few, L. R., Ray, L. A., Sweet, L. H., & Munafò, M. R. (2011). Delayed reward discounting and addictive behavior: A meta-analysis. Psychopharmacology, 216, 305–321.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Mantsch, J. R., Baker, D. A., Funk, D., Lê, A. D., & Shaham, Y. (2016). Stress-induced reinstatement of drug seeking: 20 years of progress. Neuropsychopharmacology, 41(1), 335–356.

    Article  CAS  PubMed  Google Scholar 

  • Mathar, D., Erfanian Abdoust, M., Marrenbach, T., Tuzsus, D., & Peters, J. (2022). The catecholamine precursor tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLOS Computational Biology, 18(12), e1010785.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Matochik, J. A., London, E. D., Eldreth, D. A., Cadet, J.-L., & Bolla, K. I. (2003). Frontal cortical tissue composition in abstinent cocaine abusers: A magnetic resonance imaging study. Neuroimage, 19(3), 1095–1102.

    Article  PubMed  Google Scholar 

  • Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. Quantitative Analyses of Behavior, 5, 55–73.

    Google Scholar 

  • Mollick, J. A., & Kober, H. (2020). Computational models of drug use and addiction: A review. Journal of Abnormal Psychology, 129(6), 544.

    Article  PubMed  PubMed Central  Google Scholar 

  • Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103–130.

    Article  Google Scholar 

  • Naik, A., Shariff, R., Yasui, N., Yao, H., & Sutton, R. S. (2019). Discounted reinforcement learning is not an optimization problem. Preprint. arXiv:1910.02140.

    Google Scholar 

  • Ognibene, D., Fiore, V. G., & Gu, X. (2019). Addiction beyond pharmacological effects: The role of environment complexity and bounded rationality. Neural Networks, 116, 269–278.

    Article  PubMed  PubMed Central  Google Scholar 

  • Patel, H., & Amlung, M. (2020). Acute and extended exposure to episodic future thinking in a treatment seeking addiction sample: A pilot study. Journal of Substance Abuse Treatment, 116, 108046.

    Article  PubMed  Google Scholar 

  • Pierce, R. C., & Kumaresan, V. (2006). The mesolimbic dopamine system: The final common pathway for the reinforcing effect of drugs of abuse? Neuroscience & Biobehavioral Reviews, 30(2), 215–238.

    Article  CAS  Google Scholar 

  • Poletti, M., Logi, C., Lucetti, C., Del Dotto, P., Baldacci, F., Vergallo, A., Ulivi, M., Del Sarto, S., Rossi, G., Ceravolo, R., et al. (2013). A single-center, cross-sectional prevalence study of impulse control disorders in Parkinson disease: Association with dopaminergic drugs. Journal of Clinical Psychopharmacology, 33(5), 691–694.

    Article  CAS  PubMed  Google Scholar 

  • Radenbach, C., Reiter, A. M., Engert, V., Sjoerds, Z., Villringer, A., Heinze, H.-J., Deserno, L., & Schlagenhauf, F. (2015). The interaction of acute and chronic stress impairs model-based behavioral control. Psychoneuroendocrinology, 53, 268–280.

    Article  PubMed  Google Scholar 

  • Redish, A. D. (2004). Addiction as a computational process gone awry. Science, 306(5703), 1944–1947.

    Article  CAS  PubMed  Google Scholar 

  • Redish, A. D., Jensen, S., & Johnson, A. (2008). Addiction as vulnerabilities in the decision process. Behavioral and Brain Sciences, 31(4), 461–487.

    Article  Google Scholar 

  • Rösch, S. A., Stramaccia, D. F., & Benoit, R. G. (2022). Promoting farsighted decisions via episodic future thinking: A meta-analysis. Journal of Experimental Psychology: General, 151(7), 1606.

    Article  PubMed  Google Scholar 

  • Rozeboom, W. W. (1958). “What is learned?”—An empirical enigma. Psychological Review, 65(1), 22.

    Article  CAS  PubMed  Google Scholar 

  • Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.

    Article  CAS  PubMed  Google Scholar 

  • Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13(3), 900–913.

    Article  CAS  PubMed  Google Scholar 

  • Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Wurst, F., Kronbichler, M., & Friston, K. (2015). Optimal inference with suboptimal models: Addiction and active Bayesian inference. Medical Hypotheses, 84(2), 109–117.

    Article  PubMed  PubMed Central  Google Scholar 

  • Shenhav, A., Rand, D. G., & Greene, J. D. (2017). The relationship between intertemporal choice and following the path of least resistance across choices, preferences, and beliefs. Judgment and Decision Making, 12(1), 1–18.

    Article  Google Scholar 

  • Sinclair, H., Lochner, C., & Stein, D. J. (2016). Behavioural addiction: A useful construct? Current Behavioral Neuroscience Reports, 3, 43–48.

    Article  Google Scholar 

  • Snider, S. E., LaConte, S. M., & Bickel, W. K. (2016). Episodic future thinking: Expansion of the temporal window in individuals with alcohol dependence. Alcoholism: Clinical and Experimental Research, 40(7), 1558–1566.

    Article  PubMed  Google Scholar 

  • Solway, A., Lohrenz, T., & Montague, P. R. (2017). Simulating future value in intertemporal choice. Scientific Reports, 7(1), 43119.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Sozou, P. D. (1998). On hyperbolic discounting and uncertain hazard rates. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1409), 2015–2020.

    Article  PubMed Central  Google Scholar 

  • Story, G. W., Vlaev, I., Seymour, B., Darzi, A., & Dolan, R. J. (2014). Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective. Frontiers in Behavioral Neuroscience, 8, 76.

    Article  PubMed  PubMed Central  Google Scholar 

  • Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.

    Google Scholar 

  • Szpunar, K. K., & Schacter, D. L. (2013). Get real: Effects of repeated simulation and emotion on the perceived plausibility of future experiences. Journal of Experimental Psychology: General, 142(2), 323.

    Article  PubMed  Google Scholar 

  • van Rooij, I., & Blokpoel, M. (2020). Formalizing verbal theories: A tutorial by dialogue (preprint). psyarxiv.

    Google Scholar 

  • Vikbladh, O. M., Meager, M. R., King, J., Blackmon, K., Devinsky, O., Shohamy, D., Burgess, N., & Daw, N. D. (2019). Hippocampal contributions to model-based planning and spatial memory. Neuron, 102(3), 683–693.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Voon, V., Derbyshire, K., Rück, C., Irvine, M. A., Worbe, Y., Enander, J., Schreiber, L. R., Gillan, C., Fineberg, N. A., Sahakian, B. J., et al. (2015). Disorders of compulsivity: A common bias towards learning habits. Molecular Psychiatry, 20(3), 345–352.

    Article  CAS  PubMed  Google Scholar 

  • Wagner, B., Mathar, D., & Peters, J. (2022). Gambling environment exposure increases temporal discounting but improves model-based control in regular slot-machine gamblers. Computational Psychiatry, 6(1), 142–165. Ubiquity Press.

    Google Scholar 

  • Wang, X., Li, B., Zhou, X., Liao, Y., Tang, J., Liu, T., Hu, D., & Hao, W. (2012). Changes in brain gray matter in abstinent heroin addicts. Drug and Alcohol Dependence, 126(3), 304–308.

    Article  CAS  PubMed  Google Scholar 

  • Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.

    Article  Google Scholar 

  • Yaari, M. E. (1965). Uncertain lifetime, life insurance, and the theory of the consumer. The Review of Economic Studies, 32(2), 137–150.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Isaac Kinley .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kinley, I., Becker, S. (2024). Impulsivity and Compulsivity in Bayesian Reinforcement Learning Models of Addiction: A Computational Critique of the Habit Theory. In: Vandaele, Y. (eds) Habits. Springer, Cham. https://doi.org/10.1007/978-3-031-55889-4_13

Download citation

Publish with us

Policies and ethics