Abstract
Addiction is sometimes argued to represent an extreme dominance of habitual behaviour, driven by stimulus–response associations, over goal-directed behaviour, involving planning based on action–outcome contingencies. In this chapter, we formalize a recent elaboration on this “habit theory” of addiction using Bayesian reinforcement algorithms as models of habit and planning. In these models, compulsivity and intertemporal impatience, both considered important elements of addiction, can arise through a dominance of habit over planning, but only on the assumption that the planning system does not overvalue addictive rewards. That is, the habit theory of addiction implicitly assumes that the planning system, in contrast to the habit system, ascribes appropriately high values to non-addictive rewards and appropriately low values to addictive rewards. However, recent evidence suggests that goal-directed overvaluation of addictive rewards is a key driver of addiction, which presents a significant challenge for the habit theory. We discuss whether this challenge will prove to be insurmountable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
This minimal value is \(\sigma ^2_r + \epsilon \), where \(\epsilon \) depends on the forgetting rate w such that \(w = 1\) implies \(\epsilon = 0\).
- 2.
On this view, dopamine transmission can be taken to encode \(1/\sigma ^{2}_r\) rather than \(\delta \) in the current learning equations. Thus, with increased dopamine transmission, \(\sigma ^2_r\) decreases and \(\alpha \), the “learning rate”, increases.
- 3.
It is perhaps surprising that \(\mathrm {Var}[R_s]\) depends only on the relative values of \(\theta _{s'}\), i.e., that multiplying \(\boldsymbol {\theta }\) by some scalar \(k > 1\) does not decrease \(\mathrm {Var}[R_s]\). This is because the variance is a linear function of the mixture weights; the variances of the beta-binomial and Dirichlet-multinomial distribution are similarly dependent only on the relative values of their concentration parameters when the number of trials is 1.
References
Adams, C. D. (1982). Variations in the sensitivity of instrumental responding to reinforcer devaluation. The Quarterly Journal of Experimental Psychology Section B, 34(2b), 77–98.
Amlung, M., Vedelago, L., Acker, J., Balodis, I., & MacKillop, J. (2017). Steep delay discounting and addictive behavior: A meta-analysis of continuous associations. Addiction, 112(1), 51–62.
Atance, C. M., & O’Neill, D. K. (2001). Episodic future thinking. Trends in Cognitive Sciences, 5(12), 533–539.
Audrain-McGovern, J., Rodriguez, D., Epstein, L. H., Cuevas, J., Rodgers, K., & Wileyto, E. P. (2009). Does delay discounting play an etiological role in smoking or is it a consequence of smoking? Drug and Alcohol Dependence, 103(3), 99–106.
Bickel, W. K., Athamneh, L. N., Basso, J. C., Mellis, A. M., DeHart, W. B., Craft, W. H., & Pope, D. (2019). Excessive discounting of delayed reinforcers as a trans-disease process: Update on the state of the science. Current Opinion in Psychology, 30, 59–64.
Bickel, W. K., Koffarnus, M. N., Moody, L., & Wilson, A. G. (2014). The behavioral-and neuro-economic process of temporal discounting: A candidate behavioral marker of addiction. Neuropharmacology, 76, 518–527.
Boileau, I., Payer, D., Chugani, B., Lobo, D., Houle, S., Wilson, A., Warsh, J., Kish, S., & Zack, M. (2014). In vivo evidence for greater amphetamine-induced dopamine release in pathological gambling: A positron emission tomography study with [11C]-(+)-PHNO. Molecular Psychiatry, 19(12), 1305–1313.
Bruner, N. R., & Johnson, M. W. (2014). Demand curves for hypothetical cocaine in cocaine-dependent individuals. Psychopharmacology, 231, 889–897.
Coelho, L. P. (2013, April). Integral of the product of two gaussians. Retrieved from https://luispedro.org/files/derivations/gaussian_integral.pdf
Cuzen, N. L., & Stein, D. J. (2014). Behavioral addiction: The nexus of impulsivity and compulsivity. In Behavioral addictions (pp. 19–34). Elsevier.
D’Argembeau, A. (2013). On the role of the ventromedial prefrontal cortex in self-processing: The valuation hypothesis. Frontiers in Human Neuroscience, 7, 372.
Daugherty, J. R., & Brase, G. L. (2010). Taking time to be healthy: Predicting health behaviors with delay discounting and time perspective. Personality and Individual Differences, 48(2), 202–207.
Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans’ choices and striatal prediction errors. Neuron, 69(6), 1204–1215.
Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature Neuroscience, 8(12), 1704–1711.
Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 150–159)
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. AAAI/IAAI, 1998, 761–768.
Dickinson, A. (1985). Actions and habits: The development of behavioural autonomy. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 308(1135), 67–78.
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312–325.
Eppinger, B., Walter, M., Heekeren, H. R., & Li, S.-C. (2013). Of goals and habits: Age-related and individual differences in goal-directed decision-making. Frontiers in Neuroscience, 7, 253.
Epstein, D. H. (2020). Let’s agree to agree: A comment on Hogarth (2020), with a plea for not-so-competing theories of addiction. Neuropsychopharmacology, 45(5), 715–716.
Everitt, B. J., & Robbins, T. W. (2005). Neural systems of reinforcement for drug addiction: From actions to habits to compulsion. Nature Neuroscience, 8(11), 1481–1489.
Everitt, B. J., & Robbins, T. W. (2016). Drug addiction: Updating actions to habits to compulsions ten years on. Annual Review of Psychology, 67, 23–50.
Forster, S. E., Steinhauer, S. R., Ortiz, A., & Forman, S. D. (2021). Evaluating effects of episodic future thinking on valuation of delayed reward in cocaine use disorder: A pilot study. The American Journal of Drug and Alcohol Abuse, 47(2), 199–208.
Friston, K. (2012). Policies and priors. B. Gutkin, & S. H. Ahmed (Eds.) Computational neuroscience of drug addiction (pp. 237–283). Springer.
Friston, K. J., Shiner, T., FitzGerald, T., Galea, J. M., Adams, R., Brown, H., Dolan, R. J., Moran, R., Stephan, K. E., & Bestmann, S. (2012). Dopamine, affordance and active inference. PLoS Computational Biology, 8(1), e1002327.
García-Pérez, Á., Aonso-Diego, G., Weidberg, S., & Secades-Villa, R. (2022). Effects of episodic future thinking on reinforcement pathology during smoking cessation treatment among individuals with substance use disorders. Psychopharmacology, 239(2), 631–642.
Gershman, S. J., & Bhui, R. (2020). Rationally inattentive intertemporal choice. Nature Communications, 11(1), 3365.
Gillan, C. M., Kosinski, M., Whelan, R., Phelps, E. A., & Daw, N. D. (2016a). Characterizing a psychiatric symptom dimension related to deficits in goal-directed control. eLife, 5, e11305.
Gillan, C. M., Robbins, T. W., Sahakian, B. J., van den Heuvel, O. A., & van Wingen, G. (2016b). The role of habit in compulsivity. European Neuropsychopharmacology, 26(5), 828–840.
Guillem, K., & Ahmed, S. H. (2018). Preference for cocaine is represented in the orbitofrontal cortex by an increased proportion of cocaine use-coding neurons. Cerebral Cortex, 28(3), 819–832.
Guillem, K., Brenot, V., Durand, A., & Ahmed, S. H. (2018). Neuronal representation of individual heroin choices in the orbitofrontal cortex. Addiction Biology, 23(3), 880–888.
Haruno, M., & Kawato, M. (2006). Heterarchical reinforcement-learning model for integration of multiple cortico-striatal loops: fMRI examination in stimulus-action-reward association learning. Neural Networks, 19(8), 1242–1254.
Hassabis, D., Kumaran, D., Vann, S. D., & Maguire, E. A. (2007). Patients with hippocampal amnesia cannot imagine new experiences. Proceedings of the National Academy of Sciences, 104(5), 1726–1731.
Hogarth, L. (2020). Addiction is driven by excessive goal-directed drug choice under negative affect: Translational critique of habit and compulsion theory. Neuropsychopharmacology, 45(5), 720–735.
Huang, Y., Yaple, Z. A., & Yu, R. (2020). Goal-oriented and habitual decisions: Neural signatures of model-based and model-free learning. NeuroImage, 215, 116834.
Hunter, L. E., Bornstein, A. M., & Hartley, C. A. (2018). A common deliberative process underlies model-based planning and patient intertemporal choice. bioRxiv, 499707.
Hutcheson, D., Everitt, B., Robbins, T., & Dickinson, A. (2001). The role of withdrawal in heroin addiction: Enhances reward or promotes avoidance? Nature Neuroscience, 4(9), 943–947.
Jaynes, E. T. (1957a). Information theory and statistical mechanics. Physical Review, 106(4), 620.
Jaynes, E. T. (1957b). Information theory and statistical mechanics. II. Physical Review, 108(2), 171.
Johnson, A., & Redish, A. D. (2007). Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point. Journal of Neuroscience, 27(45), 12176–12189.
Keramati, M., & Gutkin, B. (2013). Imbalanced decision hierarchy in addicts emerging from drug-hijacked dopamine spiraling circuit. PloS One, 8(4), e61489.
Kinley, I., Amlung, M., & Becker, S. (2022). Pathologies of precision: A Bayesian account of goals, habits, and episodic foresight in addiction. Brain and Cognition, 158, 105843.
Krieckhaus, E., & Wolf, G. (1968). Acquisition of sodium by rats: Interaction of innate mechanisms and latent learning. Journal of Comparative and Physiological Psychology, 65(2), 197.
Lee, R. S., Hoppenbrouwers, S., & Franken, I. (2019). A systematic meta-review of impulsivity and compulsivity in addictive behaviors. Neuropsychology Review, 29, 14–26.
Lee, S. W., Shimojo, S., & O’Doherty, J. P. (2014). Neural computations underlying arbitration between model-based and model-free learning. Neuron, 81(3), 687–699.
MacKillop, J., Amlung, M. T., Few, L. R., Ray, L. A., Sweet, L. H., & Munafò, M. R. (2011). Delayed reward discounting and addictive behavior: A meta-analysis. Psychopharmacology, 216, 305–321.
Mantsch, J. R., Baker, D. A., Funk, D., Lê, A. D., & Shaham, Y. (2016). Stress-induced reinstatement of drug seeking: 20 years of progress. Neuropsychopharmacology, 41(1), 335–356.
Mathar, D., Erfanian Abdoust, M., Marrenbach, T., Tuzsus, D., & Peters, J. (2022). The catecholamine precursor tyrosine reduces autonomic arousal and decreases decision thresholds in reinforcement learning and temporal discounting. PLOS Computational Biology, 18(12), e1010785.
Matochik, J. A., London, E. D., Eldreth, D. A., Cadet, J.-L., & Bolla, K. I. (2003). Frontal cortical tissue composition in abstinent cocaine abusers: A magnetic resonance imaging study. Neuroimage, 19(3), 1095–1102.
Mazur, J. E. (1987). An adjusting procedure for studying delayed reinforcement. Quantitative Analyses of Behavior, 5, 55–73.
Mollick, J. A., & Kober, H. (2020). Computational models of drug use and addiction: A review. Journal of Abnormal Psychology, 129(6), 544.
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103–130.
Naik, A., Shariff, R., Yasui, N., Yao, H., & Sutton, R. S. (2019). Discounted reinforcement learning is not an optimization problem. Preprint. arXiv:1910.02140.
Ognibene, D., Fiore, V. G., & Gu, X. (2019). Addiction beyond pharmacological effects: The role of environment complexity and bounded rationality. Neural Networks, 116, 269–278.
Patel, H., & Amlung, M. (2020). Acute and extended exposure to episodic future thinking in a treatment seeking addiction sample: A pilot study. Journal of Substance Abuse Treatment, 116, 108046.
Pierce, R. C., & Kumaresan, V. (2006). The mesolimbic dopamine system: The final common pathway for the reinforcing effect of drugs of abuse? Neuroscience & Biobehavioral Reviews, 30(2), 215–238.
Poletti, M., Logi, C., Lucetti, C., Del Dotto, P., Baldacci, F., Vergallo, A., Ulivi, M., Del Sarto, S., Rossi, G., Ceravolo, R., et al. (2013). A single-center, cross-sectional prevalence study of impulse control disorders in Parkinson disease: Association with dopaminergic drugs. Journal of Clinical Psychopharmacology, 33(5), 691–694.
Radenbach, C., Reiter, A. M., Engert, V., Sjoerds, Z., Villringer, A., Heinze, H.-J., Deserno, L., & Schlagenhauf, F. (2015). The interaction of acute and chronic stress impairs model-based behavioral control. Psychoneuroendocrinology, 53, 268–280.
Redish, A. D. (2004). Addiction as a computational process gone awry. Science, 306(5703), 1944–1947.
Redish, A. D., Jensen, S., & Johnson, A. (2008). Addiction as vulnerabilities in the decision process. Behavioral and Brain Sciences, 31(4), 461–487.
Rösch, S. A., Stramaccia, D. F., & Benoit, R. G. (2022). Promoting farsighted decisions via episodic future thinking: A meta-analysis. Journal of Experimental Psychology: General, 151(7), 1606.
Rozeboom, W. W. (1958). “What is learned?”—An empirical enigma. Psychological Review, 65(1), 22.
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.
Schultz, W., Apicella, P., & Ljungberg, T. (1993). Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. Journal of Neuroscience, 13(3), 900–913.
Schwartenbeck, P., FitzGerald, T. H., Mathys, C., Dolan, R., Wurst, F., Kronbichler, M., & Friston, K. (2015). Optimal inference with suboptimal models: Addiction and active Bayesian inference. Medical Hypotheses, 84(2), 109–117.
Shenhav, A., Rand, D. G., & Greene, J. D. (2017). The relationship between intertemporal choice and following the path of least resistance across choices, preferences, and beliefs. Judgment and Decision Making, 12(1), 1–18.
Sinclair, H., Lochner, C., & Stein, D. J. (2016). Behavioural addiction: A useful construct? Current Behavioral Neuroscience Reports, 3, 43–48.
Snider, S. E., LaConte, S. M., & Bickel, W. K. (2016). Episodic future thinking: Expansion of the temporal window in individuals with alcohol dependence. Alcoholism: Clinical and Experimental Research, 40(7), 1558–1566.
Solway, A., Lohrenz, T., & Montague, P. R. (2017). Simulating future value in intertemporal choice. Scientific Reports, 7(1), 43119.
Sozou, P. D. (1998). On hyperbolic discounting and uncertain hazard rates. Proceedings of the Royal Society of London. Series B: Biological Sciences, 265(1409), 2015–2020.
Story, G. W., Vlaev, I., Seymour, B., Darzi, A., & Dolan, R. J. (2014). Does temporal discounting explain unhealthy behavior? A systematic review and reinforcement learning perspective. Frontiers in Behavioral Neuroscience, 8, 76.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Szpunar, K. K., & Schacter, D. L. (2013). Get real: Effects of repeated simulation and emotion on the perceived plausibility of future experiences. Journal of Experimental Psychology: General, 142(2), 323.
van Rooij, I., & Blokpoel, M. (2020). Formalizing verbal theories: A tutorial by dialogue (preprint). psyarxiv.
Vikbladh, O. M., Meager, M. R., King, J., Blackmon, K., Devinsky, O., Shohamy, D., Burgess, N., & Daw, N. D. (2019). Hippocampal contributions to model-based planning and spatial memory. Neuron, 102(3), 683–693.
Voon, V., Derbyshire, K., Rück, C., Irvine, M. A., Worbe, Y., Enander, J., Schreiber, L. R., Gillan, C., Fineberg, N. A., Sahakian, B. J., et al. (2015). Disorders of compulsivity: A common bias towards learning habits. Molecular Psychiatry, 20(3), 345–352.
Wagner, B., Mathar, D., & Peters, J. (2022). Gambling environment exposure increases temporal discounting but improves model-based control in regular slot-machine gamblers. Computational Psychiatry, 6(1), 142–165. Ubiquity Press.
Wang, X., Li, B., Zhou, X., Liao, Y., Tang, J., Liu, T., Hu, D., & Hao, W. (2012). Changes in brain gray matter in abstinent heroin addicts. Drug and Alcohol Dependence, 126(3), 304–308.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Yaari, M. E. (1965). Uncertain lifetime, life insurance, and the theory of the consumer. The Review of Economic Studies, 32(2), 137–150.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kinley, I., Becker, S. (2024). Impulsivity and Compulsivity in Bayesian Reinforcement Learning Models of Addiction: A Computational Critique of the Habit Theory. In: Vandaele, Y. (eds) Habits. Springer, Cham. https://doi.org/10.1007/978-3-031-55889-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-55889-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-55888-7
Online ISBN: 978-3-031-55889-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)