Theory-driven computational models of drug addiction in humans: Fruitful or futile?

Maladaptive behavior in drug addiction is widely regarded as a result of neurocognitive dysfunctions. Recently, there has been a growing trend to adopt computational methods to study these dysfunctions in drug-addicted patients, not least because it provides a quantitative framework to infer the psychological mechanisms that may have gone awry in addiction. We therefore sought to evaluate the extent to which these theory-driven computational models have fulfilled this purpose in addiction research. We discuss several learning and decision-making theories proposed to explain symptoms that characterize impaired control and the intense urge to use drugs in addiction, and outline the computational algorithms frequently used to model these processes. Specifically, impaired behavioral control over drugs have been explained by aberrant reinforcement learning algorithms and an imbalance between model-based and model-free control, whereas the strong desire for drugs might be explained by a neurocomputational model of incentive sensitization and behavioral economic theory. We argue that while theory-driven computational models may appear to be useful tools that generate novel mechanistic insights into drug addiction, their use should be informed by psychological theory, experimental data, and clinical observa-


Introduction
Drug addiction, a severe form of substance use disorder, is characterized by dysfunctional patterns of drug use that persist at the expense of the users' health and wellbeing. Once conceived as a moral failure [1] and conceptualized by a physical and psychological dependence on addictive drugs [2] , drug addiction is now widely regarded as a neuropsychiatric disorder with clear biological underpinnings [ 3 , 4 ]. At its core, drug-addicted patients often show a myriad of addictive behaviors that are maladaptive (see Table 1 ), such as a loss of control over drug use that continues even in the face of negative consequences, or a strong desire to use drugs rather than engage in other pleasurable activities. The mechanisms underpinning these symptoms have each been explained by several influential theories in the field. For example, theories of impaired reinforcement learning and instrumental control focus on maladaptive behavior typically seen in drug-addicted patients. They postulate that cognitive processes regulating adaptive actions become impaired, which may explain why drug use spirals out of control and continues even in the face of adverse consequences [5][6][7][8] . Behavioral economic theories and the incentive sensitization theory of addiction [9][10][11][12] , on the other hand, address the intense desire and the strong preference for drugs over other forms of rewards. They postulate that drugs and associated stimuli have obtained relatively greater subjective value than other stimuli, such that they overshadow non-drug-related alternatives.
As sophisticated analytical methods become more popular, many researchers have begun to use mathematical models to elucidate the possible mechanisms that underlie psychiatric symptoms. This has led to a new field, computational psychiatry [13][14][15] . Methods within this domain can either be data-driven , i.e. using machine learning or dimensionality reduction techniques to uncover hidden patterns within a dataset without any reference to theory or prior knowledge (i.e. a bottom-up approach), or theory-driven , i.e. using existing knowledge to build computational models that tests mechanistic hypotheses of a certain disorder (i.e. a top-down approach). The latter is more common in the realm of drug addiction. Theory-driven computational models can break down overt behavior into its subcomponents, allowing greater precision in the inferences of the cognitive components implicated in the disorder. Several different computational models have been deployed to study addictive behaviors, but the type of model applied  Table 1 Symptoms of substance use disorder based on the Diagnostic and Statistical Manual (5th edition).

DSM-5 symptom DSM-5 criteria grouping
Taking substances in larger amounts than intended Impaired control Unsuccessful attempts to cut down on substance use Great deal of time devoted to obtaining substances, using substances, or recovering from its effects Intense desire or urge for the substance Failure to fulfill major work, school, or home obligations because of substance use Social impairments Recurrent interpersonal problems as a result of substance use Social or recreational activities given up because of substance use Taking substances under physically hazardous situations Risky use Continued substance use despite knowledge of recurrent physical and psychological problems Marked increase in dose of substances to achieve desired effect Tolerance Taking substances to alleviate negative and aversive symptoms Withdrawal Table 2 Summary of learning and decision-making theories of addiction and their computational models. • Enables the visualization of prediction error signaling in the human brain. • Detects subtle impairments to learning processes in addicted patients.
• Able to dissociate between goal-directed and habit systems.
• Strong neuroscientific basis • Robust relationships between economic demand and addiction severity. • There is evidence that model parameters predict how well one responds to treatment.

Limitations of computational model(s)
• Interpretation of model parameters not always generalizable across studies.
• Model-free learning may not be equivalent to habits.
• Model not testable in drug-addicted humans.
• Unclear how addictive drugs alter value after long term use.
in research studies very much depends on the addiction theory in question.
The aim of this brief review is to examine the extent to which these theory-driven computational methods have added value to our understanding of addictive behavior in humans. We acknowledge that the scope of computational psychiatry in drug addiction is wide, and a recent descriptive review has provided an exhaustive outline of the computational models available in the field [16] . However, the present review will focus on computational models of learning and decisionmaking that primarily explain selected symptoms of addiction. Computational models such as reinforcement learning and model-free / modelbased algorithms of instrumental control have been used to explain impaired control over behavior in drug-addicted patients ( Fig. 1 ); whereas a neurocomputational model of incentive sensitization and behavioral economic choice models of addiction putatively explains patients' strong desire and preference for drugs ( Fig. 2 ). We have summarized these models in Table 2 , and will discuss them in the context of the theory to evaluate how much they help to advance our understanding of drug addiction. We recognize that other symptoms (e.g. tolerance, withdrawal) are also characteristic of drug addiction ( Table 1 ), but as these symptoms have not yet been translated into testable computational models in humans, they will not be part of this review.

Aberrant reinforcement learning in drug addiction
Adaptive behaviors are usually shaped by their consequences [17] . Humans are more likely to repeat actions that bring about positive outcomes and avoid actions that lead to negative consequences. This tendency has been described as reinforcement learning -using past consequences to guide future behavior towards maximizing benefits and minimizing punishments. As drug addiction is associated with maladaptive patterns of drug use that persist despite adverse consequences, disruptions in reinforcement learning processes have been put forward as an explanation for why drug use in addicted patients is not amenable to negative outcomes [8] .
Reinforcement learning can be mathematically described with computational algorithms ( Fig. 1A ). These algorithms can deconstruct reinforcement learning into its constituent subprocesses, and the breakdown of any of these subprocesses can lead to impaired reinforcement learning. These subprocesses are codified in the free parameters of these computational algorithms. For example, most reinforcement learning algorithms (e.g. Q-Learning) include parameters like the learning rate and inverse temperature (sometimes known as exploration/exploitation trade-off or reinforcement sensitivity). The learning rate reflects the impact of feedback on choices, whereas the inverse temperature parameter Reinforcement learning is typically tested with a probabilistic reinforcement learning task (sometimes known as the n-arm bandit task). In a typical setup, participants are presented with several choices and are required to learn by trial-and-error to select the choice that either gets a reward or avoids a punishment. Task performance is then modelled computationally via a reinforcement learning algorithm (e.g. Q-Learning) and a choice selection rule. Measures of interest are usually individual differences in learning, which are codified as free parameters of the algorithms, such as the learning rate and the inverse temperature. (B) Goal-directed and habitual control are modelled with model-based and model-free learning algorithms respectively. This is normally tested in a two-step sequential decision-making task, in which participant makes decisions in a two-stage process. Stage 1 consists of two choices (e.g. sun and moon), each with a fixed probability of transitioning into two distinct states in stage 2. After the transition, participants are again required to select one of two options in stage 2. Their choices are rewarded probabilistically based on a random Gaussian walk. Task performance is then modelled with two separate algorithms. The model-free algorithm concerns the maximizing of expected rewards, and learning occurs by trial-and-error via prediction errors, irrespective of the different stages. By contrast, the model-based algorithm tracks the transition probability between stages 1 and 2 and keeps track of the state that has led to maximal rewards. The tendency to engage in model-based (goal-directed) and model-free (habitual) choice selections are the main measures of interest. . describes the tendency to stick to learned values when making choices [18] . It is also possible to add parameters to model psychological processes that are deemed relevant in addiction research. Some examples of such parameters include 'stickiness' (the tendency to repeat past responses) and a counterfactual learning rate (the impact of current feedback on the unselected choice), which model perseverative responding and neglect of alternative reward, respectively [19][20][21] . Researchers typically define a priori the parameters of interest in their learning algorithms and fit them to trial-by-trial learning behavior to identify the parameter values that best approximate participants' choices [22] .
A substantial body of research has confirmed that reinforcement learning subprocesses are largely underpinned by dopaminergic neurons and frontostriatal networks [23][24][25][26][27][28] . Addictive drugs are known to act on the midbrain and the mesolimbic system [29] , and are associated with dopaminergic downregulation and frontostriatal impairment after long-term drug use [ 4 , 30 , 31 ]. Therefore, it is possible that impaired reinforcement learning processes in drug-addicted patients are a reflection of these neuroadaptive changes [ 7 , 8 ]. Recent research has applied computational models to study alterations to reinforcement learning processes in the brain and behavior quantitatively, providing additional nuances and insights that conventional measures (e.g. summary or mean scores of behavioral responses) could not. Until now, two clear themes have emerged from the addiction literature: altered prediction error signaling along with the associated neural pathways and impaired learning subprocesses.
Prediction errors -defined as the discrepancy between expected and received outcomes -serve as learning signals for behavior such that adaptive actions are those that minimize these errors [32] . Studies investigating neural prediction errors in humans have emerged when functional neuroimaging and computational learning algorithms were combined to visualize these signals, a method known as model-based neuroimaging [33] . Using this method, prediction errors are computed trialby-trial and applied as a parametric modulator to functional imaging (e.g. fMRI) data to identify neural activities that correlate with these signals [ 23 , 34-36 ]. Several studies using this method found reduced prediction error signaling within the frontostriatal network of drugaddicted patients, especially in the striatum, the medial orbitofrontal cortex and the medial frontal gyrus [37][38][39] . A recent meta-analysis further point towards blunted prediction error signaling in the striatum of drug-addicted patients [40] . In keeping with preclinical studies, these findings confirm that long-term use of addictive drugs can disrupt prediction error signaling, rendering behavior less sensitive to salient reinforcement [41] . There is also evidence for a diminished functional coupling between the ventral striatum -a key hub for prediction error signaling -and the dorsolateral prefrontal cortex in alcohol use disorder [42] . This indicates the possibility that prediction errors are not effectively communicated within the brain. Taken together, these findings allude to the notion that prediction error signals implicated in learning are dysfunctional in drug-addicted patients, which has repercussions on their behavior.

Fig. 2. Task paradigms and computational methods that model strong desires for drugs in addicted patients. (A)
The incentive sensitization theory is often tested in humans with a cue-reactivity paradigm. Participants are presented with a series of drug and neutral cues during an fMRI scan to identify the neural activity associated with each cue. Greater neural activity in response to drug cues is presumed as evidence for heighted incentive salience. A neurocomputational model was developed from findings of behavioral neuroscience studies to quantitatively model these signals. This model suggests that incentive salience is not only determined by learned value, but also by physiological states, and addictive drugs are capable in altering these states to cause persistent 'wanting' of drugs. However, this model has not been tested in addicted humans. (B) The behavioral economic perspective of addiction proposes that drug addiction results from an overvaluation of drugs, and this is tested with a drug purchase task and an economic demand model. In a typical experiment, participants are presented with different price points of drugs / alcohol. They are then asked to indicate (hypothetically) how much drugs they are willing to consume for each price point. The individual responses are then fitted to an economic demand curve, and key measures such as the demand intensity and demand elasticity are determined from the economic model. It is hypothesized that demand intensity (maximum consumption at no cost) and demand elasticity (sensitivity to changes in price points) are associated with vulnerability to develop addiction.
The deconstruction of reinforcement learning processes has also yielded valuable insights into the behavioral profile of drug addicted patients. For example, reduced punishment learning rates have been reported in both cocaine-addicted patients and animals, suggesting that negative feedback itself has little effect on subsequent behavior [ 21 , 43 , 44 ]. Other non-modeling studies also corroborated the notion that cocaine-addicted patients are less amenable to negative feedback [45][46][47][48] . There is also evidence for reward learning deficits in drugaddicted patients [49][50][51] , but the supporting evidence is less consistent compared to those for punishment learning [ 52 , 53 ]. Other computational studies have also identified an increased 'stickiness' (i.e. greater tendency to repeat responses) with prior choices [20] and the tendency to ignore alternative, but more rewarding, choices (i.e. reduced counterfactual learning rate) in drug-addicted patients [19] . Thus, a blunted prediction error mechanism coupled with the reduced impact of negative outcomes, increased 'stickiness' (the tendency to repeat past responses), and the tendency to neglect alternative rewards may collectively explain why negative outcomes are unable to effectively regulate learned behaviors in drug-addicted patients. These in turn increases the risk of behavior to spiral out of control in addicted patients.
There is a prevailing assumption that computational parameters of reinforcement learning isolate unique cognitive processes, and consequently, that these parameters are readily generalizable across different studies [54] ; this is one of the primary reasons why computational psychiatry methods are so appealing [ 13 , 15 , 55 ]. However, this does not necessarily hold true. A recent study tested the generalizability of parameters by fitting identical model parameters to different cognitive tasks (within the same participant) and found striking differences in the parameter values between different tasks [56] . Moreover, most parameters were not significantly correlated with their identical pair in a different task, suggesting that these parameters are not generalizable across different contexts despite the similarity in their mathematical definitions. In fact, within the addiction literature, similar computational parameters have reported seemingly contradictory findings. For example, Lim et al. [21] reported that stimulant-addicted patients show a significantly reduced punishment learning rate during reinforcement learning. By contrast, Kanen et al. [20] , who re-analyzed task performance on a serial probabilistic reversal learning paradigm in a similar patient group, found the complete opposite, namely a significant increase in the punishment learning rate. These examples allude to the lack of generalizability of these parameters and highlight the need to contextualize the interpretation of modeling results within the behavioral paradigm tested. Although the tasks used by Lim et al. [21] and Kanen et al. [20] both involve probabilistic learning from reinforcement, optimal performance in the reversal learning paradigm requires a balance between responding to and ignoring negative feedback, as participants are anticipating changes to the learning rule (i.e. contingency reversals). Thus, the increased punishment learning rate in stimulant-addicted patients reported by Kanen et al. [20] might reflect the increased tendency to switch responses upon all negative feedback, which may affect task performance in a volatile environment negatively. However, when the learning context is stable, as in the study by Lim et al. [21] , then stimulant-addicted patients show a marked reduction in punishment learning. Therefore, while computational parameters reveal novel and interesting insights into the behavioral profile of drug-addicted patients, the interpretation of the findings must be grounded within the behavioral context in question.

Goal-directed and habitual control in drug addiction
Drug use in addicted patients, while initially driven by its rewarding effects, becomes progressively maladaptive over time as drug-seeking persists despite the adverse consequences. It has been suggested that instrumental actions, such as drug-seeking, are regulated by two dissociable systems [ 57 , 58 ]. On the one hand, the goal-directed system (underpinned by the ventromedial fronto-striatal pathways [59][60][61][62][63] ) regulates adaptive behavior that is flexible and sensitive to consequences. On the other hand, the habit system (subserved by the premotor-dorsal-striatal circuit [ 64 , 65 ]) mediates well-learned actions that become automatically elicited by environmental cues, but are no longer sensitive to immediate consequences. These two systems work in parallel to support adaptive behavior [57] , but addictive drugs are thought to disrupt the balance between these two systems [ 5 , 6 ]. Everitt and Robbins [ 5 , 6 ] proposed that drug addiction reflects the endpoint of a transition from goaldirected to habitual and ultimately compulsive drug-seeking. According to this theory, drug use becomes dysregulated when drug-seeking habits -elicited by drug-associated stimuli via conditioned reinforcementare no longer subjected to top-down control. Aberrant drug-seeking has been linked to neuroadaptive changes within distinct corticostriatal circuitries, such that the control over drug-seeking progressively shifts from ventral striatal (implicated in goal-directed actions) to dorsal striatal subsystems (implicated in habits) [ 5 , 6 , 66 ]. Simultaneously, addictive drugs also impair prefrontal cortical functions, which were meant to regulate behavior such as inhibitory control [67] and reinforcement learning [ 68 , 69 ]. As a consequence, this further exacerbates the dysregulation of drug-seeking. It is noteworthy that habit formation is not pathological per se; it facilitates routines and mundane activities, increasing our capacity to automate behavior for efficiency [70] . However, habits in drug-addicted patients may become pathological once they are automatically elicited by drug-associated stimuli, coupled with an impaired goal-directed system that cannot exert control over behavior that has become maladaptive.
In experimental psychology, habits are operationally measured as the absence of deliberate goal-directed actions. A behavior is classed as habitual under two conditions, i.e. when it is a) not sensitive to the consequences of the actions, and b) continues even when it is no longer needed to obtain rewards [ 58 , 71 , 72 ]. Experimentalists typically test these conditions using outcome devaluation and contingency degradation paradigms, respectively [ 73 , 74 ]. However, both paradigms do not tell us much about the effectiveness of the habit system or its counterpart, the goal-directed system, in regulating behavior. A predominance of the habit system in drug addiction is likely to reflect either an enhanced habit system, an impaired goal-directed system, or a dysregulation between the two [ 75 , 76 ]. To dissociate the two systems, computational models of model-based and model-free reinforcement learninghypothetical constructs thought to model goal-directed and habit learning, respectively -have been developed [77] . The model-based learning system prospectively evaluates actions against an internal model to identify the best course of action -like the goal-directed system's sensitivity to causality and outcomes. Unlike model-based learning, the model-free learning system maximizes future reward by repeating actions that had been rewarded in the past. As behaviors that are learned via the modelfree system are not immediately sensitive to the outcomes, they have been suggested to represent habits. The model-based and model-free systems are supported by distinct neural systems, which, interestingly, differ from those that subserve goal-directed actions and habits, respectively. Whilst model-based behavior has been associated with prefrontalparietal cortices, model-free behavior has been linked more strongly with the ventral striatum [78][79][80] . The transition from model-based to model-free control over behavior is thought to depend on the relative uncertainty of each system. Accordingly, the model-free system takes over control when there is low uncertainty [77] . Simulated data from this model seem to be in agreement with typical outcome devaluation performance in rats [77] .
Independent of computational models, converging lines of experimental evidence from preclinical and human studies provide empirical support for the habit theory of addiction. These studies have shown that exposure to psychostimulant drugs and alcohol enhance habit formation for both drug and non-drug related behaviors in rodents [82][83][84][85][86] , control over cocaine-seeking habits depends on dorsal striatal mechanisms [ 87 , 88 ], and that alcohol and cocaine-addicted patients show a general bias towards habitual responses in behavioral tasks [ 46 , 89-91 ]. Psychostimulant drug users also report increased habitual tendencies in their daily lives, as measured with self-report instruments (e.g. the Creature of Habit Scale) [ 89 , 92 ]. However, the causal link between habitual drug-seeking and compulsivity, which has been demonstrated in rodent studies of cocaine and alcohol [93][94][95] , cannot easily be replicated in the same way in humans for obvious ethical reasons. In other words, the distinction between drug-seeking and drug-taking, though crucial for this theory and dissociable in animal models, may not readily be translatable to behavioral paradigms in humans. Consequently, the model-free / model-based computational framework is not designed to test cueelicited drug-seeking habits in humans. Nevertheless, there has been a growing interest to apply computational methods to model these instrumental mechanisms in humans to test the extent to which addicted patients rely on habitual mechanisms of behavior in a non-drug-related context. Within the model-based / model-free computational framework, it is conceivable that drug addiction is linked with an increased reliance on model-free learning, as well as a reduced tendency to engage in modelbased learning [81] .
The key contribution of the model-based / model-free theory is that it allows the dissociation of goal-directed and habitual control in drugaddicted patients, which has been predominantly tested with a two-step sequential decision-making task [ 79 , 80 ] ( Fig. 1B ). In this task, participants are presented with two decision-making stages, each consisting of two choices. Choices made in the first stage have a fixed probability of transitioning to one of two possible states, whereas choices in the second stage are rewarded based on a Gaussian random walk ( Fig. 1B ). Model-based and model-free learners are thought to show different responding strategies: Model-based learners guide their actions based on the transition probabilities between the two stages, such that they learn which stage one choices will lead to a previously rewarded stage two. In other words, they build a cognitive map of the task structure, mapping all choices to possible eventualities -akin to action-outcome learning important for goal-directed behavior. By contrast, individuals who are more reliant on model-free learning tend to repeat their choice sequences that led to a reward, irrespective of the transition probability between the two stages -an approach that has been suggested to reflect habit learning. The two-step task has been administered to a wide range of drug-addicted patients, including alcohol [96] and methamphetamine [97] , who showed impaired modelbased leaning with preserved model-free learning. This task has also shown that reduced model-based learning and greater alcohol expectancies jointly predicted relapse in treatment-seeking alcohol drinkers [98] . These observations would, at face value, suggest that a habit bias associated with drug addiction is due to an impaired goal-directed system, rather than an overactive habit system. However, the transition from model-based to model-free systems has not been directly investigated in the context of drug addiction.
It is noteworthy that reduced model-based control in human drug users has not been widely replicated in published addiction research, as subsequent work has found no evidence of associations between the severity of alcohol use and alterations to either model-based or modelfree control [ 99 , 100 ]. Furthermore, experimental data on a rodent version of the two-step task revealed a different finding: model-free, but not model-based learning, predicted methamphetamine self-administration [101] . These inconsistencies have casted doubt on the validity of modelbased and model-free learning in representing the goal-directed and habit systems. Although model-based learning seems to reflect the goaldirected system adequately, as it is sensitive to action-outcome contingencies, this may not be the case with model-free learning, which does not reflect the autonomous and stimulus-elicited nature of habits. In essence, model-free learning involves a reward-maximization process, such that outcomes still affect behavior, albeit less rapidly than in model-based learning. In other words, model-free learning is still valuedependent, which goes against the experimental interpretation of habits, in that it is independent of any outcome values. When comparing the twostep task with outcome devaluation task -an established paradigm to measure habits -model-free learning is not statistically related to experimental measures of habits [ 102 , 103 ]. These studies further support the notion that model-free learning is not equivalent to habits, contrary to widely held assumptions.
In light of these limitations, Miller and colleagues proposed a new computational model that describes habits as a product of pure repetition, and therefore do not have a direct relationship with value (so called 'value-free' habits) [104] . Their model operates on the assumption that the goal-directed and habit systems are dissociable in the brain. As such, there is a goal-directed controller, which tracks state transitions and reward values; and a habitual controller, which updates the habit strength of each action based on repetition frequency. These controllers are then subjected to an arbiter that will determine the final behavior based on the strength of each system. Whilst this model has yet to be empirically tested, the model is consistent with existing knowledge on the construct of habits, i.e., habits are formed after extensive repetition, with well-characterized neural substrates. Importantly, the model predictions seem to explain prior data on contingency degradation -a finding that the model-based / model-free account fails to explain [ 105 , 106 ]. Therefore, establishing the translational validity of this model with human data is warranted. One suggested approach is for future studies to fit empirical data to both model-free and the Miller algorithms, and apply model comparison procedures to identify which algorithm provides a better fit to behavioral data.

Cue-elicited urge in drug addiction: an incentive sensitization approach
A strong urge for drugs is thought to play a key role in the persistence of addictive behaviors ( Table 1 ). Drug-addicted patients often report a strong urge to use the drug, particularly when triggered by environmental stimuli previously associated with the drug. This does not only occur during periods of active drug use, but also occur after long periods of abstinence, rendering patients vulnerable to relapse. While the 'must do' drug urge has been suggested to be mediated by the conditioned reinforcement of drug-conditioned stimuli [ 6 , 107-109 ], another explanation has been provided by the theory of incentive sensitization [ 9 , 12 , 110 ]. The latter postulates a neuropsychological basis for drug cravings, namely that the long-term use of addictive drugs sensitizes the brain's motivational system, enabling drugs and drug-linked stimuli to acquire salience through associative learning mechanisms [ 9 , 12 , 110 ]. This process is hypothesized to result in an increased subjective 'wanting' of the drug, such that the brain becomes hyper-reactive towards any drug-conditioned stimulus, eliciting a 'near-compulsive' urge to use drugs [9] . Indeed, several studies using cue-reactivity paradigms ( Fig. 2A ) have shown that drug-related cues elicit greater neural and physiological responses in drug-addicted patients compared to neutral cues (see [ 111 , 112 ] for reviews). Based on incentive sensitization theory, increased cue-induced brain activity is presumed to reflect increased salience of these cues, but this assertion has, to the best of our knowledge, not yet been empirically tested in humans.
According to the incentive sensitization theory, the attribution of incentive salience to drug cues involves the interaction between addictive drugs and the dopaminergic system. These drugs (e.g. cocaine, nicotine and heroin) sensitizes the mesolimbic dopaminergic system, such that any stimulus that is predictive of drugs acquires incentive salience [ 12 , 110 ]. Initial proposals indicate that temporal difference prediction error models used in reinforcement learning could be a candidate computational framework that describes the process of attributing incentive salience [113] . According to this proposal, dopaminergic responses elicited during conditioning play a dual role, namely a role in reinforcement learning and in a role in attributing incentive salience [113] . However, this proposal assumes that learned values directly translates into motivation, which is not always true. For instance, one can develop a strong 'liking' for cake but does not 'want' to eat cake after a very filling meal. This simple example illustrates the notion that physiological states (e.g. satiety) may modulate incentive salience. To dissociate between learned values and incentive salience, Zhang et al. [114] developed an alternative dynamic neurocomputational model of incentive salience [114] , henceforth known as the 'Zhang model' ( Fig. 2A ). The authors of this model have introduced a physiological factor to the temporal difference model equation, which accounts for the physiological states that could modulate incentive salience. Like hunger, greater drug-deprivation states (higher ) heighten incentive salience [ 114 , 115 ]. The authors further hypothesized that this parameter is hijacked by addictive drugs, by continuously amplifying the 'wanting' state in drug-addicted patients.
The Zhang model of incentive sensitization has received some support from experimental observations. Some studies have suggest that neurons in the ventral pallidum selectively fire when encountering a desired conditioned stimulus (CS) [ 116 , 117 ]. Berridge [115] suggested that the ventral pallidum may be a neural substrate that uniquely encodes for incentive salience. Acute exposure to amphetamines or opioids have been shown to markedly enhance neuronal firing towards a previously learned CS in the ventral pallidum, putatively suggesting that addictive drugs can modulate an existing physiological state to amplify 'wanting' signals [ 116 , 118 ]. There is also some evidence that this sensitization effect is long-lasting. In support of this, Berridge and colleagues showed that rats exposed to amphetamine prior to Pavlovian testing exerted greater vigor for conditioned reward-seeking behaviors, even if the conditioned response had been trained before amphetamine exposure [ 119 , 120 ]. They therefore concluded that amphetamine induced long-lasting changes to CS's effect on reward-seeking (via incentive sensitization), though it is noteworthy that earlier studies have found that amphetamine infusions to the nucleus accumbens enhanced the behavioral effects of the conditioned reinforcer instead of the CS [ 108 , 109 ]. Nevertheless, the Zhang model proposes that addictive drugs modify the 'wanting' state in a manner independent from learned values, triggering a state-induced modulations of incentive salience. Although the Zhang model offers a putative explanation of incentive salience within a computational framework, it has, to the best of our knowledge, not yet been tested in drug-addicted patients, who are known to show increased neural responses towards drug-related cues, as evidenced by cue-reactivity paradigms [121][122][123] . It is difficult to disentangle whether the enhanced cue reactivity in drug-addicted patients reflects prior learned values (i.e. predictive of reward) or incentive salience (i.e. increased 'wanting'). This critical question could only be addressed by human paradigms (behavioral or neural) that dissociate learned values from incentive salience, but these are not available at present. It is noteworthy that activations in the same brain regions seen in cue reactivity paradigms could also be induced by the drugs themselves [124][125][126] , but the latter is more likely to reflect the pharmacological effect of sensitization, rather than incentive salience. The majority of evidence for incentive sensitization (and the related neurocomputational model) comes from animal studies. Thus, there remains a translational gap between animals and humans for this theory. For instance, it is unclear how physiological states parametrized in this computational model (i.e. the parameter) can actually be measured in drug-addicted patients. Nonetheless, there have been attempts to link incentive salience to other computational processes implicated in addictive behavior such as model-free learning. A recent study reported that individuals who attribute incentive salience to reward-conditioned cues (i.e. sign-trackers) were more likely to engage in model-free learning compared with those who are not [127] . However, it remains to be clarified whether this increased tendency to engage in model-free learning is actually driven by the incentive salience or the conditioned reinforcing properties of the CS. This interesting observation tentatively reveals a process underpinning the dysregulation of maladaptive habits, but whether this process is actually translatable to pathological drug use in humans remains to be shown. Future studies are warranted to replicate and extend these findings to drug-addicted patients.

Exaggerated goal narrowing: addiction from a behavioral economic perspective
Theories of behavioral economics emphasize the importance of drug choices in the development of addictive behavior. As such, drug use of addicted patients has been explained in terms of impaired decisionmaking [ 10 , 128 ]. According to this view, uncontrolled drug use primarily reflects an overvaluation of drugs, such that the drug choice will always be chosen over its alternative [ 10 , 11 ]. As such, pathological use has been argued to reflect increased goal narrowing, such that drugaddicted patients spend a great deal of time, effort, and creativity to seek out and use drugs [ 129 , 130 ].
From a behavioural economics' perspective, addictive drugs are thought to carry greater subjective value than other alternative choices in drug-addicted patients. This hypothesis is typically tested with the drug purchase task, and participants' responses are then fitted to an economic demand model ( Fig. 2B ). This model examines the value of drugs in the context of the economic demand, defined as the consumption of a commodity (i.e. drugs) at a given cost [131] . A higher economic demand would mean that drugs are highly valued, and this can be identified by two key outputs of this economic model, demand intensity and demand elasticity . Demand intensity reflects the degree to which drugs are sought when there are no costs involved. By contrast, demand elasticity describes how sensitive consumption patterns are to changing costs [132] . It has been hypothesized that drug addiction is characterized by greater demand intensity (i.e. more drugs consumed when freely available) and lower elasticity (i.e. consumption continues despite increased costs) and the variation of these parameters reflects addiction severity [ 10 , 133 , 134 ]. In other words, the risk of developing addictive behaviors can be predicted by a greater amount of drug consumption (high demand intensity), and when drug use is unaffected by changes in its costs, i.e. there is an inflexibility in the consumption pattern (low demand elasticity).
A large body of evidence has shown that economic demand parameters are associated with addiction severity; the latter is measured using wide range of metrics including the number of heavy drinking sessions per week, the money spent on alcohol / tobacco, or questionnaires scores of drug use severity (e.g. Alcohol Use Disorder Identification Test [135] ; Fagerstrom Test for Nicotine Dependence [136] ). Cross-sectional studies with large cohorts of alcohol drinkers and tobacco smokers have revealed that addiction severity measures are positively related to both demand elasticity and intensity [137][138][139][140][141] . In other words, these findings suggest that people at risk for developing addiction may show an increased tendency to assign greater subjective value to drugs of abuse and are less likely to alter their drug use in the face of increasing (personal or monetary) costs. This observation has not only been replicated in heavy users of cocaine [ 142 , 143 ] and cannabis [144] , but similar observations have also been reported in rats [145] . Furthermore, economic demand parameters have been applied in treatment settings to identify those patients who respond better to treatment. Several studies noted that treatment-seeking smokers and drinkers with lower economic demands before the intervention (i.e. at baseline) were more likely to benefit from behavioral interventions such as contingency management and motivational interviewing [146][147][148][149][150][151] . These findings may suggest that the propensity to attribute greater value to drugs of abuse is a key indicator of sensitivity to treatment, although it is noteworthy that the therapeutic effects (particularly for contingency management) are short lived [ 152 , 153 ].
Although there are numerous studies showing that economic measures of drug demand are associated with addiction vulnerability, one notable limitation is the lack of a clear etiological explanation of how overvaluation of addictive drugs develops from chronic drug use. Other learning theories, as discussed in prior sections, seem to better account for how regular drug use may develop into addictive behavior, such as by subverting regulatory control systems, or increasing incentive salience. However, a standalone behavioral economic framework does not explain how long-term use of addictive drugs alters the drug's reinforcing efficacy. The neurobiological processes that underpin this process are also still elusive. It is likely that the overvaluation of drugs of abuse develops via associative learning mechanisms, as discussed in previous sections, but this hypothesis remains to be tested. Notwithstanding these limitations, the parameters of the economic model appear to be reliable predictors of addiction severity.

Summary and future outlook
We have briefly reviewed the advances of theory-driven computational models in drug addiction research, specifically focusing on how computational models have been applied to learning and decisionmaking theories of addiction. In our view, the primary contribution of theory-driven computational models is their ability to provide mechanistic algorithmic descriptions of latent processes that generate overt behaviors. However, it is worth acknowledging that most existing computational studies in drug-addicted patients do not directly investigate cognitive processes within a drug-related context (e.g. drug-seeking behaviors), largely due to practical and ethical constraints. Nonetheless, this contemporary tool has given researchers an exciting new avenue to test explicit hypotheses about the cognition that underpins addictive behavior.
Although we present several computational accounts that may explain addictive behaviors, it is important to note that drug addiction is a complex and multi-faceted disorder, and that no single computational process can account for all aspects of the disorder. Many drug-addicted patients cycle through stages of binge/intoxication, withdrawal/negative affect, and preoccupation/anticipation (craving) [ 4 , 154 ]. It is conceivable that each stage may involve distinct theoretical (and computational) processes, but these theories might also interact to varying extents [155] . For example, an increased goal narrowing towards drugs (indexed by increased subjective value) may be exacerbated by impaired behavioral control (indexed by deficits in reinforcement learning / goal-directed control), which inevitably leads to persistence in drug use. Additionally, the heterogeneity within drug addiction should be considered when interpreting computational findings. For instance, patients addicted to different types of drugs may be associated with different computational profiles. Chronic use in patients addicted to psychostimulants may be perpetuated by a lack of avoidance behavior [ 21 , 46 ], whereas in opioid users it may reflect enhanced avoidance (e.g. lose-shift behavior) [156] . Other potential factors include the variations in duration of use, the pattern of drug use, and the age of drug use onset, but these factors, to the best of our knowledge, have not yet been explored in the literature. Thus, any computational findings should be considered with the complexity and multidimensional nature of drug addiction in mind.
It has been argued that a computational understanding of the generative process behind addiction symptoms has important clinical utilities [157] . Some notable examples include the hope that these computational metrics can be the new frontier for dimensional psychiatry [158][159][160] , detecting or stratifying vulnerable individuals (e.g. in a preaddiction state [161] ) or identifying those who respond well to treat-ment [14] . However, whilst computational psychiatry has refined our understanding of drug addiction symptoms, it has not, to the best of our knowledge, generated novel and testable psychological theories for addiction in humans. Further, the translation from computations to the clinic has also been less successful [162] . Thus far, theory-driven computational models have not yet delivered any tangible metrics for clinical use, not least because of the lack of standardization and low reliability of these measures [162] . We also observe an increasing trend where sophisticated mathematical models developed to model psychiatric symptoms lack any biological or psychological plausibility, which may run the risk of being far removed from clinical relevance.
There is no doubt that the use of computational modeling in addiction research is here to stay. Therefore, improvements in the reliability and validity of these computational models in the field of addiction are needed. A number of such studies have begun in healthy volunteers, i.e. investigating the reliability of these computational models against model-agnostic behavioral data and improving the reliability of model fitting methods [163][164][165][166] . Particularly in the context of addiction, it is important to establish that the computational parameters modelled are indeed relevant to the behavioral process of interest. One straightforward way to do this is by assessing the relationship between model parameters (e.g. 'stickiness') and the actual behavioral output they approximate (e.g. rate of perseveration) [167] . Moreover, it is also pertinent to investigate how these computational parameters change over the course of the disorder to establish the relevance of these parameters in drug addiction. Although there has been some recent evidence that computational parameters can predict treatment response [ 149 , 168 , 169 ], more longitudinal studies may be necessary to examine exactly how computational indices underpin the development and the persistence of addictive behaviors [ 155 , 170 ]. Ultimately, the onus is on researchers to critically examine the biological and psychological plausibility of computational indices before applying them to research or clinical practice.

Declaration of Competing Interest
KDE was supported by an Alexander von Humboldt Fellowship for senior researchers (GBR 1202805 HFST-E) and receives editorial honoraria from Karger Publishers. TVL reports no conflict of interest.

Data availability
No data was used for the research described in the article.

Funding
This work has been supported by Dorothy Langton and her family, as well as an Angharad Dodds John Bursary in Mental Health and Neuropsychiatry to TVL, and the NIHR Cambridge Biomedical Research Centre.