Chemogenetic inhibition of ventral tegmental area dopamine neurons or their inputs to the nucleus accumbens disrupts cue-triggered reward seeking but not reward taking

Reward-paired cues acquire powerful incentive motivational properties that allow them to elicit reward-seeking behavior. This motivational influence is adaptive under normal conditions but can lead to compulsive behavior in addiction and related disorders. Here, we used pathway-specific chemogenetic inhibition of dopamine neurons to determine their role in mediating cue-triggered reward seeking vs. reward taking, which were parsed using a novel analytic approach based on microstructural analysis of the individual behaviors required for earning (lever pressing) and collecting rewards (food-cup approach), respectively. We found that inhibiting ventral tegmental area (VTA) dopamine neurons effectively abolished cue-motivated reward seeking, without impacting that cue’s ability to increase reward-taking behavior, suggesting that this pathway supports the response-invigorating effects of reward-paired cues, but not their ability to signal reward availability. Inhibiting these dopamine neurons also spared rat’s ability to select actions based on reward value, further defining the scope of this system’s involvement in the regulation of reward-seeking behavior. We also found that locally inhibiting dopamine projections to nucleus accumbens (NAc), but not to medial prefrontal cortex (mPFC), disrupted cue-triggered reward seeking. Interestingly, response suppression induced by inhibiting NAc dopamine inputs was more marked in rats that tended to respond to reward-paired cues by engaging in exploratory reward seeking, without trying to collect reward, than in rats that engaged in complete bouts of seeking and taking, suggesting such responses vary in their dependence on mesolimbic dopamine. These findings provide important new insights into the behavioral and neural mechanisms underlying adaptive and compulsive forms of cue-motivated behavior.

The mesocorticolimbic dopamine system is strongly implicated in both adaptive and maladaptive forms of motivated behavior (for review, Berridge and Robinson, 2016;Farrell et al, 2018;Ostlund and Halbout, 2017;Salamone and Correa, 2012), though there is much left to learn about its specific contributions to such behavior.Dopamine appears to be most important when one must adapt to challenges while pursuing a reward (Beeler et al, 2014;Salamone et al, 2012), such as when reward-seeking behavior is first learned (Choi et al, 2005;Wassum et al, 2012) or when there is an abrupt increase in the effort needed to earn reward (Ostlund et al, 2012).Dopamine also mediates the tendency for reward-paired cues to motivate reward seeking (Dickinson et al, 2000;Wassum et al, 2011), suggesting that it helps reinvigorate such behavior when environmental cues signal that it may be adaptive to do so.
Although reward-paired cues can invigorate reward seeking through a nonspecific motivational, or arousal, process (Rescorla and Solomon, 1967), they can also influence action selection by engaging associative cognitive processes (Kruse et al, 1983; for review, see Corbit and Balleine, 2016).For instance, it has recently been proposed that reward-paired cues are able to temporarily inflate the perceived efficacy of reward-seeking behavior, which in turn increases the likelihood that such actions will be performed (Cartoni et al, 2013; see also Rescorla, 1994).This account is particularly useful for explaining why reward-paired cues seem to be so effective in reinvigorating reward-seeking actions that have been extinguished (for review, Cartoni et al, 2016;Holmes et al, 2010) or have a low probability of reinforcement (Cartoni et al, 2015).
To determine the specific involvement of dopamine in cue-motivated behavior, we developed a novel method to selectively assay how rewardpaired cues influence animals' perception of the efficacy of their rewardseeking actions, based on the microstructural organization of reward-seeking behavior.In particular, when rats come to doubt their ability to earn food reward by pressing a lever, they perform that action in a more exploratory manner, without attempting to collect, or take, food (Joel and Avisar, 2001).
We therefore propose that this reward-taking behavior (i.e., attempting to collect reward immediately after executing a reward-seeking action) reflects a rat's perception of the efficacy of their reward-seeking behavior.Given this, if reward-paired cues are able to inflate the perceived efficacy of rewardseeking behavior (e.g., lever pressing), then they should be effective in increasing the probability that individual reward-seeking actions are followed by reward-taking behavior (e.g., food-cup approach).
We used a Pavlovian-to-instrumental transfer (PIT) task (for review, Cartoni et al, 2016;Holmes et al, 2010) to assess the potentially dissociable influences of reward-paired cues on reward seeking and reward taking, and to disambiguate the involvement of dopamine in such processes.Previous studies using the PIT task have shown that cue-motivated reward seeking can be abolished by blocking dopamine receptors, either systemically (Dickinson et al, 2000;Ostlund and Maidment, 2011;Wassum et al, 2011) or locally in the nucleus accumbens (NAc) (Lex and Hauber, 2008), or by inactivating the ventral tegmental area (VTA) (Corbit et al, 2007;Murschall and Hauber, 2006).Importantly, if PIT depends on the ability of the reward-paired cue to inflate predictions about the efficacy of reward-seeking behavior, then it is certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 3, 2018.; https://doi.org/10.1101/272963doi: bioRxiv preprint possible that dopamine is directly involved in this process, in which case disrupting dopamine signaling should block cue-motivated reward-seeking and reward-taking behaviors.However, if dopamine contributes to an incentive motivational process that directly results in invigoration of reward seeking behavior, than we would expect manipulations of dopamine signaling to spare the influence of reward-paired cues on reward-taking behavior. The medial prefrontal cortex (mPFC) also receives substantial input from a subpopulation of VTA dopamine neurons (Lammel et al, 2008).Foodpaired cues elicit dopamine release in the mPFC (Bassareo and Di Chiara, 1997;Feenstra et al, 1999), and mPFC neurons exhibit task-related activity during PIT testing (Homayoun and Moghaddam, 2009).Despite such findings, it remains unknown if mPFC dopamine signaling is also involved in the expression of PIT performance, let alone whether its involvement is specific to the influence of reward-paired cues on reward seeking versus reward taking.
In the current study, we introduce an analytic approach to parse the effects of reward-paired cues on reward-seeking versus reward-taking behavior during PIT, and investigate how chemogenetically inhibiting mesocorticolimbic dopamine circuits, (globally, or projections to NAc or mPFC), impacts these two distinct aspects of cue-motivated behavior.We also examined the influence of reward value on reward seeking and taking, and its dependence on mesocorticolimbic dopamine.

Animals:
In total, 93 Long-Evans Tyrosine hydroxylase (Th):Cre+ rats (hemizygous Cre+) (Mahler et al, 2018;Mahler et al, 2014;Witten et al, 2011) and wildtype (WT) littermates were used for this study.Subjects were at least 3 months of age at the start of the experiment and were single-or pairedhoused in standard plexiglass cages on a 12h/12h light/dark cycle.Animals were maintained at ~85% of their free-feeding weight during behavioral procedures.All experimental procedures that involved rats were approved by the UC Irvine Institutional Animal Care and Use Committee and were in accordance with the National Research Council Guide for the Care and Use of Laboratory Animals.
Apparatus: Behavioral procedures took place in sound-and lightattenuated Med Associates chambers (St Albans, VT, USA).Each chamber was equipped with two retractable levers.Grain-based dustless precision pellets (45 mg, BioServ, Frenchtown, NJ, USA) were delivered into a food cup positioned in a recessed magazine between the levers.A photobeam detector was used to record food-cup approaches.Chambers were illuminated during all sessions.
Testing occurred at least 25 days after surgery to allow adequate time for viral expression of hM4Di throughout dopamine neurons, including in terminals within the NAc and mPFC.
Experiment 1: Microstructural analysis of reward seeking and taking under conditions of varying response-contingent feedback about reward delivery.
Instrumental learning.WT rats (n = 9) first underwent 2 d of magazine training (40 pellet deliveries in each 1-h daily session) before receiving 9 d of instrumental training.In each session, rats had continuous access to a single lever, which could be pressed to deliver food pellets into the food cup.The schedule of reinforcement for this was adjusted over days from continuous reinforcement (CRF) to increasing random intervals (RI), with one day each of CRF, RI-15s, RI-30s, and 6 days of RI-60s.Each session was terminated after 30 min or after 20 rewards deliveries.
Varying response-contingent feedback.Following training, rats were given a series of tests to assess the role of response-contingent feedback on reward-seeking and reward-taking behavior.Rats were given 3 tests (30-min each, pseudorandom order over days) during which lever pressing caused: 1) activation of the pellet dispenser to deliver a pellet into the food cup (RI-60s schedule), 2) activation of the pellet dispenser to deliver a pellet into an external cup not accessible to the rats, producing associated sound and certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
Experiments 2 and 3: Role of mesocorticolimbic dopamine in cue-motivated reward seeking and taking: Pavlovian conditioning.Th:Cre+ rats (n = 64) underwent 2 d of magazine training, as in Experiment 1, before they received 8 daily Pavlovian conditioning sessions, each of which consisted of a series of 6 presentations of a two-min audio cue (CS+; either 10Hz tone or white noise; 80dB), with trials separated by a variable 3-min interval (range 2-4-min).During each CS+ trial, pellets were delivered on a 30-sec random time schedule, resulting in an average of 4 pellets per trial.Rats were separately habituated to an unpaired auditory stimulus (CS-; see Supplemental Materials and Methods).
Conditioning was assessed by comparing the rate of food-cup approaches during CS periods (between CS onset to first pellet delivery to avoid detection of unconditioned feeding) to the rate during pre-CS periods.
Instrumental training.Following Pavlovian conditioning, all rats were given 9 d of instrumental training with lever pressing ultimately reinforced on an RI-60s schedule, as in Experiment 1.
Pavlovian-to-instrumental transfer test.After a session of Pavlovian conditioning, rats were given a 30-min extinction session, during which lever presses were recorded but had no consequence (i.e., no pellets or cues).On the next day, rats were given a PIT test, during which the lever was continuously available but produced no rewards.Following 8 min of extinction, the CS+ and CS− were each presented 4 times (2 min per trial) in pseudorandom order and separated by a 3-min fixed interval.Rats from Experiment 2 (effects of system-wide dopamine neurons inhibition) received CNO (5mg/kg, i.p.) or vehicle (5% DMSO in saline) injection 30 min prior the test.They were subjected to a second test following retraining (see supplemental Materials and Methods), during which they were given the alternative drug pretreatment.Because microinjection procedures produced additional variability in task performance, rats in Experiment 3 (effects of inhibiting dopamine release in NAc and mPFC) underwent a total of 4 tests.
They received either CNO microinfusions (1mM, 0.5uL/side or 0.3uL/side, for NAc and mPFC respectively) or vehicle (DMSO 5% in aCSF) 5 min before the start of each test, and were given two rounds of testing each with CNO and vehicle (test order counterbalanced across other experimental conditions).Devaluation Testing.To selectively devalue one of the food outcomes prior to testing, all rats were satiated on grain pellets or sucrose solution by providing them with 90 min of unrestricted access to that food in the home cage.After 60 min of feeding, rats received CNO (5mg/kg, i.p.) or vehicle injections.After an additional 30 minutes of feeding, rats were placed in the chamber for a test in which they had continuous access to both levers.The test began with a 5-min extinction phase (no rewards or cues), which was immediately followed by a 15-min reinforced phase, during which each action was reinforced with its respective outcome (CRF for the first 5 rewards, then RR-20).Rats were given a total of 4 devaluation tests, 2 after CNO and 2 after vehicle, alternating the identity of the devalued outcome across the 2 tests in each drug condition (test order counterbalanced across training and drug conditions).
Analysis: Lever presses were used to operationally define reward-seeking behavior.Because rats must approach the food cup in order to collect (take) any food that their reward seeking may have produced, we used this behavior to identify reward taking.We based our specific definition of reward taking on the natural microstructural pattern of food-cup approaches that surrounded individual lever presses, which was significantly elevated for the first 2.5 sec post-press period (Figure S1)(see also Marshall and Ostlund, in press).Given the inherent contingency between these measures of seeking and taking, we focused our analysis on the probability that a reward-seeking action was followed by a bout of reward taking (i.e., at least one food cup approach within 2.5 sec).This measure (total reward-taking bouts / total reward-seeking actions) controls for variability across conditions in the opportunity that rats had to engage in reward taking.We operationally defined the subset of reward seeking actions that were not directly followed by reward taking as "exploratory" reward-seeking actions.Statistical analysis employed two-sided t-tests and general(ized) linear mixed-effects models (Pinheiro and Bates, 2000), the latter of which allows for simultaneous parameter estimation as a function of condition (fixed effects) and the individual rat (random effects).
Using the Statistics and Machine Learning Toolbox in MATLAB, mixed-effects models provide t-values to reflect the statistical significance of the regression coefficient relative to the population mean (i.e., simple effects).These simple effects are indicative of main effects and interactions when a factor has at most two levels.For factors with at least 3 levels, F-tests were conducted to reveal the overall significance of the effect or interaction(s) involving this factor.A full description of the mixed-effects model analysis is available in Supplementary Materials and Methods.

Microstructural characteristics of reward seeking and taking: effects of response-contingent feedback about reward delivery
We first characterized how the omission of response-contingent reward impacts the microstructural organization of instrumental reward-seeking and reward-taking actions, which is an important first step in understanding how reward-paired cues influence such behavior.We analyzed the microstructural relationship between lever pressing (RI-60s schedule) and food-cup approach behavior to collect food pellets (see Methods and Figure 1A for illustration).
As seen in Fig 1B, rats were more likely to approach the food cup within a few seconds of performing a lever-press response than they were at other times.
Not surprisingly, rats were much more likely to approach the food cup when a lever press was reinforced than when it was not (t(8) = 19.33,p < .001),suggesting that they could detect the pellets being delivered based on sound and tactile cues produced by dispenser activation.Food-cup approaches also increased following nonreinforced presses (Fig 1B,Fig 1C), indicating that rats also made sporadic, uncued attempts to take rewards that were not actually delivered.To explore the factors controlling cued and uncued reward taking, rats were given a series of tests in which lever pressing produced either 1) pellet dispenser cues and pellet delivery, 2) pellet dispenser cues only (no pellet delivery), or 3) no pellet dispenser cues or pellet delivery (extinction).Reward-taking actions were defined as at least one food-cup approach response occurring within 2.5 sec of a lever press (see Fig S1 and Materials and Methods for further details).Given the inherent contingency between reward seeking and reward taking, we focused our analysis on the proportion of all reward-seeking actions (individual lever presses) that were followed by a reward-taking action (Fig 1D and 1E).We found that this measure of reward taking was highly sensitive to manipulations of reinforcement contingency.Reward taking was much more likely to occur after reinforced than nonreinforced trials, regardless of whether the pellet dispenser cues were presented alone or together with food delivery (Fig 1F; ts(8) ≥ 13.74, ps < .001).When lever presses had no consequence at all (i.e., during the extinction test), rats rarely followed such actions with an attempt to collect food from the cup (Fig 1F).These data generally support the view developed above that animals adapt to reward omission by applying a more exploratory reward-seeking strategy that normally involves withholding any subsequent attempt to collect reward, as if they no longer perceived this reward-seeking action to be a reliable way to produce reward.

Inhibiting dopamine neurons during Pavlovian-to-instrumental transfer preferentially disrupts cue-motivated reward seeking, but not taking
We  These findings indicate that inhibiting dopamine neurons disrupts the ability of a reward-paired cue to motivate reward-seeking behavior, so we next investigated the microstructural relationship between lever pressing and food cup-approach behavior to assess how the CS+ impacted the perceived efficacy of reward-seeking action and whether this influence was also sensitive to dopamine inhibition (Fig 3E and 3F).As in Experiment 1, we focused our analysis on the proportion of reward-seeking actions that were immediately followed by a food-cup approach to normalize for differences in  S2 for analysis of reward taking frequency).This measure revealed that the CS+ (p < .001)but not the CS-(p = .501)increased the probability that rats would attempt to take a reward after lever pressing.There was no evidence of an effect of dopamine inhibition on this CS+ induced increase in reward taking (interactions involving Drug or Group, ps > .109;Table S3).Therefore, although noncontingent CS+ presentations caused rats to act as if they expected their reward-seeking behavior was once again effective in producing imminent reward delivery, this influence did not depend on VTA dopamine neuron activity.

Pathway-specific inhibition of dopamine projections to NAc, but not mPFC, disrupts cue-motivated reward seeking in Pavlovian instrumental transfer
As previously reported (Mahler et al, 2018) (Lichtenberg et al, 2017;Mahler et al, 2014;Stachniak et al, 2014), an approach previously shown to be effective in inhibiting these specific dopamine projections (Mahler et al, 2018).Figure 4C shows that, in hM4Di expressing rats, the CS+ specific increase in reward seeking was disrupted by CNO in a manner that depended on the microinjection site (Drug * CS Period * CS Type * Site interaction, t(240) = -2.99,p = .003;Table S4).As in the previous experiment, the CS+ was also effective in increasing the probability that rats would engage in reward taking after performing a reward-seeking action (Fig 4E).Specifically, we found that the CS+ (p < .001)increased the probability that a reward-seeking action was followed by an attempt to take reward, which was not affected by CNO at either site (ps > .187; Table S6; see Fig S5 and Table S5 for analysis of reward taking frequency).The CS-produced a decrease in the probability of reward taking given reward seeking compared to both pre-CS baseline (p = .040)and the CS+ (p < .001).
Given that inhibiting dopamine terminals in the NAc resulted in only a partial suppression of CS+ motivated reward seeking, but not its ability to promote reward taking, we hypothesized that variability in rats' sensitivity to certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 3, 2018.; https://doi.org/10.1101/272963doi: bioRxiv preprint the suppressive effects of this treatment may be explained by individual differences in their tendency to respond to the CS+ by engaging in exploratory bouts of reward seeking (i.e., presses that were not followed by an immediate attempt to take reward).Consistent with this, we found that for NAc rats, CS+ elicited increases in exploratory reward seeking (CS+ -pre-CS+) during the vehicle test were negatively correlated with CNO-induced suppression of reward seeking (all presses; r = -0.81,p = .027;Fig 4F).No such relationship was seen between exploratory seeking on vehicle day and sensitivity to CNO in mPFC rats (r = -0.19,p = .618).

Inhibiting dopamine neurons spares use of outcome value when selecting between reward-seeking actions
We also investigated the role of the mesocorticolimbic dopamine system in an important aspect of uncued, goal-directed behavior -the ability to flexibly choose between actions based on the current value of the rewards they produce.Sensitivity to changes in the value of food rewards, induced by sensory-specific satiety, was assessed in a separate set of rats expressing mCherry or hM4Di in VTA dopamine neurons, following administration of CNO (5mg/kg) or vehicle (Fig 5A).We found that rats pressed significantly less for the devalued food than for the non-devalued food (t(148) = -5.41,p < .001;S7).CNO treatment did not significantly affect sensitivity to devaluation in either hM4Di or mCherry rats (ps ≥ .146),indicating that this aspect of goal-directed decision making is not disrupted by dopamine neuron inhibition.Inhibiting VTA dopamine neurons also failed to disrupt sensitivity to devaluation during reinforced testing (see Fig S6).Interestingly, devaluation certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 3, 2018.; https://doi.org/10.1101/272963doi: bioRxiv preprint produced a modest but significant increase in the proportion of rewardseeking actions that were followed by reward taking, p = .040,suggesting that once rats initiated a reward-seeking action, they were at least as likely to attempt to take both a devalued reward as a nondevalued reward (i.e., when a reward is judged to be worth pursuing, it is also worth trying to collect).This measure of reward taking was not sensitive to dopamine inhibition (Fig 5C ; ps ≥ .109;Table S9; see Fig S7 and Table S8 for analysis of reward taking frequency).

_____________________________________________________________ Discussion
Our findings support the view that reward-taking behavior (i.e., attempting to collect food immediately after lever pressing) scales with rats' perception of the efficacy of their reward-seeking behavior and that rewardpaired cues influence this specific aspect of behavior, as if it had reinstated their expectation of response-contingent reward.Importantly, we found that inhibiting VTA dopamine neurons abolished cue-motivated reward seeking but not taking.Specifically, chemogenetic inhibition of dopamine neurons attenuated the ability of a food-paired CS+ to increase reward seeking, even though this cue continued to increase the likelihood that the few remaining presses were followed by an immediate attempt to take reward by approaching the food cup.A similar, albeit less pronounced, deficit in cuemotivated reward seeking was also observed when we inhibited dopamine terminals in the NAc, and, again, the ability of the CS+ to increase reward taking was spared.Inhibiting dopamine projections to the mPFC had no effect on either of these behavioral effects of the CS+.Dopamine pathway inhibition also failed to alter the sensitivity of instrumental reward seeking to outcome devaluation, demonstrating the remarkable specificity of its role in cuetriggered incentive motivation.
The lack of effect of mPFC dopamine inhibition on cue-motivated reward seeking (or taking) reported here is notable given electrophysiological findings implicating this region in the expression of PIT (Homayoun et al, 2009), though it should be noted that lesions of the mPFC have typically found little evidence that it is necessary for this aspect of behavior (Cardinal et al, 2003;Corbit and Balleine, 2003b).Such findings suggest that the mPFC may be engaged during this task, but does not normally play a causal role in the expression of PIT.
Our finding that inhibiting VTA dopamine neurons did not impact rats' sensitivity to outcome devaluation is consistent with previous findings demonstrating that this aspect of goal-directed decision making is dopamineindependent (Dickinson et al, 2000;Lex and Hauber, 2009;Lex and Hauber, 2010;Wassum et al, 2011), even though the NAc and mPFC make important contributions to this process (Bradfield and Balleine, 2017).Although our findings suggest that mesocorticolimbic dopamine signaling is not normally required for this aspect of cognitive control, they do not rule out its involvement when goal-directed decisions are more complex and/or require greater cognitive resources (Cools, 2015;Floresco, 2013;Westbrook and Braver, 2016).
Although PIT studies typically focus on the excitatory influence of reward-paired cues on reward-seeking behavior, previous studies (Corbit and Balleine, 2003a;LeBlanc et al, 2012) using more complex action sequence tasks (e.g., press left lever press right lever food) have shown that such cues can also strongly invigorate an action that is needed to complete the instrumental contingency (i.e., the right lever press), which might therefore be considered a reward-taking behavior in that it is proximal to reward delivery.
Here, we introduce an analytic approach allowing detection of cue-motivated changes in reward taking, distinct from reward seeking, using more conventional, well-characterized PIT testing procedures.This approach may help address a long-standing debate about the processes controlling food-cup approach behavior when animals are engaged in instrumental reward seeking.For instance, during PIT testing, it is generally assumed that foodcup approaches during the CS+ reflect a Pavlovian conditioned response (Holland and Gallagher, 2003;Holmes et al, 2010;Homayoun et al, 2009;Pecina et al, 2006), even though approaching the food cup is in fact required for collecting response-contingent food rewards and may therefore be instrumental in nature.The present PIT data show that food-cup approaches that occur during CS+ presentations are heterogeneous, with some occurring immediately after an instrumental lever press (attempted reward-taking actions) and others occurring at other times, most likely reflecting more conventional Pavlovian conditioned responses (see Marshall and Ostlund, in press).We found that these instrumental reward-taking food cup approaches were more likely to occur when a reward was expected, such as when response-contingent feedback signaled the delivery of reward, supporting the view that these responses relate to the perceived efficacy of the rewardseeking action.
certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 3, 2018.; https://doi.org/10.1101/272963doi: bioRxiv preprint We show that these distinct excitatory effects of the reward-paired cue on reward seeking and taking are neurochemically dissociable, such that inhibition of dopamine neurons, or their projections to NAc, disrupts the cue's influence on seeking, without impacting its influence on taking.This is consistent with findings that dopamine supports a general motivational process that facilitates the expression of preparatory, or exploratory, rewardseeking behavior, but is relatively unimportant for consummatory, or rewardtaking, actions (Fibiger and Phillips, 1986;Ikemoto and Panksepp, 1996;Veeneman et al, 2012).For example, fast-scan cyclic voltammetry studies reveal that the mesolimbic dopamine system is strongly engaged during the initiation of reward seeking, but not during reward taking (Cacciapaglia et al, 2012;Collins et al, 2016;Klanker et al, 2015;Wassum et al, 2012).Likewise, our own prior studies have shown that phasic dopamine release in the NAc is correlated with the vigor of cue-motivated lever pressing (Ostlund et al, 2014;Wassum et al, 2013) and not food cup approaches (Aitken et al, 2016), and that individual transient dopamine release events are temporally correlated with the execution of individual lever-press responses (Ostlund et al, 2014).
Altogether, such findings support the view that dopamine's role in regulating the pursuit of rewards varies as a function of reward proximity, contributing predominantly to the initiation of reward-seeking actions, rather than to their completion (at least under low effort conditions, as tested here).
The present results may also have implications for understanding the role of dopamine in pathologies of behavioral control such as obsessivecompulsive disorder (OCD).In the signal attenuation model of OCD (Joel et al, 2001;Ostlund et al, 2014), rats learn that they can no longer rely on certified by peer review) is the author/funder.All rights reserved.No reuse allowed without permission.
The copyright holder for this preprint (which was not this version posted June 3, 2018.; https://doi.org/10.1101/272963doi: bioRxiv preprint response-contingent cues to signal whether or not their reward-seeking behavior has been successful.When this happens, logical organization of their reward-seeking and reward-taking behavior disintegrates, and rats come to exhibit excessive reward seeking, typically without attempting to take reward.Importantly, blocking D1-dopamine receptors disrupts expression of these compulsive (incomplete) bouts of reward seeking, without affecting the production of complete bouts of seeking and taking, which continue to be infrequently performed on some test trials even though these rats no longer have reason to be confident in the efficacy of their reward-seeking behavior (Joel and Doljansky, 2003).Considered in this light, our findings suggest that mesolimbic dopamine circuits may mediate the tendency for reward-paired cues to promote this exploratory, or even compulsive, form of reward seeking, rather than more purposeful attempts to pursue reward.Interestingly, we found that the rats showing the greatest propensity to respond to the CS+ by exploratory seeking without taking were the same rats showing the greatest suppressive effect of NAc dopamine inhibition.Future research is warranted to further investigate this link between mesolimbic dopamine signaling and exploratory cue-motivated reward seeking, which may reflect a behavioral marker that is relevant to understanding compulsive disorders like OCD and addiction (Joel et al, 2008;Robinson et al, 2014).

Funding and Disclosure
Our work is supported by National Institutes of Health (NIH) grants MH106972, DK098709, and AG045380 and DA029035.The authors declare no competing interests.
Figure 1.Microstructural organization of instrumental behavior.A. Hungry rats were trained to perform a self-paced task (RI-60s schedule of reinforcement) in which "reward-seeking" lever presses could be followed by immediate food-cup approaches to collect the food (i.e., "reward taking").B. Probability of food-cup approaches as a function of time surrounding a reinforced or nonreinforced lever press.C. Representative organization of instrumental behaviors for an individual rat showing reward-seeking andtaking actions for each reinforced or nonreinforced press.Each line represents a reinforced trial during RI-60s training.D, E. Effects of the manipulation of reinforcement contingency on the organization of instrumental actions.Total reward-seeking (D) or -taking (E) actions at tests during which lever pressing was either reinforced under an RI-60s schedule of reinforcement with pellet delivery (Food), cues associated with food pellets delivery (Cue; i.e., activation of the pellet dispenser), or not reinforced (Extinction).F. The probability of reward taking given a reward-seeking action is highly dependent on the reinforcement contingency condition.Attempted reward taking was much more likely to occur after reinforced than nonreinforced trials, regardless of whether reinforcement was with Food, or Cue.Rats rarely followed lever press with an attempt to collect food from the cup in nonreinforced (including Extinction) conditions.show a significant effect of CNO only in hM4Di expressing rats.*p<0.05vs 0. D. Proportions of reward-seeking actions that were immediately followed by an attempt to take the reward during different PIT periods do not differ as a function of treatment or group, but were increased during CS+ as opposed to CS-or pre-CS periods.E. Representative organization of instrumental behaviors in reward-taking and -seeking actions in PIT.Data show responses for two individual Th:Cre+ rats expressing mCherry and receiving vehicle treatment during pre-CS and CS presentation periods at test.Proportions of reward-seeking actions that were immediately followed by an attempt to take the reward during different PIT periods do not differ as a function of treatment or group, but were increased during CS+ as opposed to CS-or pre-CS periods.F. Scatter plot showing the relationship of individual differences in the effect of CS+ on exploratory reward-seeking actions in the vehicle condition (Δ Exploratory Seeking; i.e., presses non followed by an attempt to collect reward) and the suppressive effect of CNO on reward seeking.Data points represent individual rats receiving intra-mPFC (left panel) or -NAc (right panel) CNO microinjections.

Experiment 4 :
Role of mesocorticolimbic dopamine in goal-directed decision making: Instrumental Training.Th:Cre+ rats (n = 20) began with 2 d of magazine training, during which they received 20 grain-pellets and 20 liquid sucrose rewards (0.1 mL of 20% sucrose solution, w/v) on a random 30-s schedule.Rats were then given 10 d of instrumental training on two distinct action-outcome contingencies (e.g., left lever press grain; right lever press sucrose) on a training schedule that gradually shifted on random ratio 20 (RR-20) schedule of reinforcement (see Supplemental Materials and Methods for details).
next conducted a PIT experiment to investigate the microstructural organization of cue-motivated reward seeking and taking, and determine how dopamine contributes to such effects.Rats with dopamine neuron-specific expression of hM4Di or mCherry (Fig 2) were trained on the PIT task (Fig 3A), which consisted a Pavlovian conditioning phase, in which two different auditory cues were paired (CS+) or unpaired (CS-) with food pellets, and a separate instrumental training phase, in which rats were trained to lever press for pellets as in Experiment 1.During PIT testing, we noncontingently presented the CS+ and CS-while rats freely engaged in nonreinforced reward seeking and taking.Pretreatment with the DREADD ligand CNO suppressed reward seeking specifically during CS+ periods in hM4Di but not mCherry rats (Fig 3B; significant Group * Drug * CS Period * CS Type interaction, t(240) = -3.14, p = .002;see Supplementary Information reward-taking opportunities (Fig 3E; see Fig S2 and Table , hM4Di expression in VTA dopamine neurons resulted in transport of DREADDs to axonal terminals in the NAc and mPFC (Fig 2).To investigate the roles of these two pathways on cue-motivated behavior, we conducted a PIT experiment in a separate set of rats with hM4Di or mCherry infection of VTA dopamine neurons, and guide cannulae aimed at the NAc or mPFC (Fig 4A; Fig S3).These rats underwent training and testing for PIT, as described above (Fig 4B), but were pretreated with intra-NAc or mPFC injections of CNO (1mM) or vehicle to achieve local inhibition of neurotransmitter release

Figure 3 .
Figure 3. Chemogenetic inhibition of dopamine neurons in Pavlovian to instrumental transfer (PIT).A. Experimental design: Following viral vectors injections and recovery, rats received Pavlovian training, during which they learned to associate an auditory cue (CS+) with food pellet delivery.During instrumental learning rats learned to lever press according to the same task used in Experiment 1. Lever pressing was extinguished (Ext) before rats were submitted to a Pavlovian to Instrumental Transfer test, during which CS+ (or a control CS-) were randomly presented.We studied the effect of CNO (0 or 5mg/kg within-subject design) pre-treatment on reward-seeking and rewardtaking at test.B. Chemogenetic inhibition of VTA dopamine neurons disrupt cue-motivated reward seeking.Reward-seeking actions in rats expressing the inhibitory DREADD hM4Di or mCherry following vehicle (left) or CNO (5mg/kg, right) treatment prior to test.Grey bars represent instrumental actions during the pre-CS period, red bars represent reward-seeking actions during CSs presentation.C. PIT expression is specifically impaired in hM4Di expressing Th:Cre+ rats.Left panel, analysis of PIT scores after vehicle treatment (Reward seeking during CS minus reward seeking during Pre-CS) show a significant elevation of reward-seeking behaviors during CS+ period for both groups; right panel, analysis of the CNO suppression score (PIT score following vehicle treatment minus PIT score following CNO treatment)

Figure 4 .
Figure 4. Pathway specific chemogenetic inhibition of dopamine in PIT. A. Th:Cre+ rats initially received VTA AAV-hSyn-DIO-hM4Di-mCherry injections and were implanted with guide cannulas aimed at the nucleus accumbens (NAc) or medial prefrontal cortex (mPFC) for microinjection of CNO (1mM) or vehicle (veh) to inhibit dopamine release at terminals at test.B. Following surgery, rats underwent PIT training and testing as described above (Pavlovian: Pavlovian Learning; Instr: Instrumental Learning; Ext: Extinction).We analyzed the microstructural organization of behavior (reward-seeking and reward-taking actions) at test.C. Pathway specific inhibition of dopamine release in the nucleus accumbens (NAc) but not the medial prefrontal cortex (mPFC) disrupts reward seeking in PIT.Reward-seeking actions during PIT in rats expressing the inhibitory DREADD hM4Di and receiving CNO or vehicle microinfusions in either the NAc or mPFC.Grey bars represent instrumental actions during the pre-CS period, red bars represent reward-seeking during CS presentation.D. PIT expression is specifically impaired following NAc CNO treatment.Left panel, analysis of PIT scores followingy vehicle treatment (Reward seeking during CS minus reward seeking during Pre-CS) show a significant elevation of reward-seeking behaviors during CS+ period for both groups; right panel, analysis of the CNO suppression score (PIT score following vehicle treatment minus PIT score following CNO treatment) show a significant effect of CNO only when injected in the NAc.*p<0.05 vs 0. E.Proportions of reward-seeking actions that were immediately followed by an attempt to take the reward during different PIT periods do not differ as a function of treatment or group, but were increased during CS+ as opposed to CS-or pre-CS periods.F. Scatter plot showing the relationship of individual differences in the effect of CS+ on exploratory reward-seeking actions in the vehicle condition (Δ Exploratory Seeking; i.e., presses non followed by an attempt to collect reward) and the suppressive effect of CNO on reward seeking.Data points represent individual rats receiving intra-mPFC (left panel) or -NAc (right panel) CNO microinjections.

Figure 5 .
Figure 5. Chemogenetic inhibition of dopamine neurons in outcome specific devaluation test. A. Th:Cre+ rats received VTA injections of AAV-hSyn-DIO-hM4Di-mCherry or AAV-hSyn-DIO-mCherry. Following recovery, rats were trained on two distinct lever press actions for two different rewards (Instrumental Learning).Rats then underwent specific outcome devaluation testing following treatment with CNO (5mg/kg) or vehicle.B. Chemogenetic VTA dopamine inhibition does not affect outcome specific devaluation.Total reward seeking actions at test on the valued (red bars) and devalued (grey) levers in hM4di or mCherry expressing Th:Cre+ rats, following CNO (5 mg/kg) TableS1for full generalized linear mixed-effects model output).Control rats in the mCherry group exhibited elevated lever pressing during CS+ trials (relative to pre-CS response rates; CS Period * CS Type interaction, p < .001),and this effect was not altered by CNO (Drug * CS Period * CS Type interaction, p = .780).In contrast, CNO pretreatment disrupted expression of CS+ motivated reward seeking in the hM4Di group (Drug * CS Period * CS Type interaction, p < .001).Specifically, hM4Di rats showed a CS+ specific elevation in lever pressing when pretreated with vehicle (CS Period * CS Type interaction, p < .001)but not CNO (CS Period * CS Type interaction, p = .684).The increase in lever pressing during the CS+ (PIT score: CS+ -pre-CS) was significant for both vector groups when pretreated with vehicle (one-sample t-tests; ps < .001;Fig3C),butwassignificantly suppressed by CNO in the hM4Di group (t(17) = -3.83,p=.001),andnot in the mCherry group (t(13) = -1.21,p=.249;Fig3C).