Probing the role of reward expectancy in Pavlovian-instrumental transfer

Reward-paired cues acquire motivational properties which allow them to invigorate instrumental reward-seeking behavior — termed Pavlovian-instrumental transfer. Somewhat surprisingly, this motivational influence is greatest for cues that serve as unreliable or otherwise weak predictors of reward. In this review, we delve into the ongoing debate about why weak and strong reward-predictive cues differ in their motivational effects. We outline evidence that, when presented with a strong reward predictor, rats exert cognitive control over their motivation to seek out new rewards to allow for efficient reward retrieval, an effect modulated by the expected probability, timing, and magnitude of reward. We also review recent research applying this approach to study how cue-motivated behavior becomes dysregulated in drug addiction and adolescent development.

Pavlovian learning is essential when foraging for rewards. Cues that signal reward availability help us decide which goals to pursue and how to pursue them, including how much time and effort to expend. But cues can also lead us astray, particularly in a world that is replete with highly desirable but ultimately dangerous rewards (e.g. junk food, addictive drugs). Such cues can elicit a powerful impulse to act that is nearly impossible to overcome, leading to out-of-control reward seeking [1][2][3]. Losing the motivation to pursue adaptive rewards is also problematic, resulting in apathy and diminished self-care [4,5]. Thus, there is a great need to identify the mechanisms regulating cue-motivated behavior and how they go awry.
Along with acquiring potent motivational properties, reward-predictive cues relay detailed cognitive information about upcoming reward. Yet much remains unclear as to how motivational and cognitive processes regulate reward pursuit, including how specific parameters (e.g. reward probability, timing, and magnitude) influence such behavior. Do they directly determine the motivational value assigned to predictive cues, or do they indirectly shape how motivation is translated into action? But answering such questions is not straightforward.

Pavlovian-instrumental transfer
The Pavlovian-instrumental transfer (PIT) paradigm is arguably the most rigorous and selective assay of cueelicited motivation [6][7][8]. During initial Pavlovian conditioning, animals receive repeated pairings between an auditory conditioned stimulus (CS) and a palatable food reward. Next, during instrumental conditioning, animals are trained to perform an uncued, self-paced lever-press response for food reward. The final test phase assesses a defining feature of motivational cues -namely, their ability to transfer motivational control to an independent reward-seeking action. This is done by scheduling occasional, noncontingent presentations of the CS while animals freely perform the lever-press response in the absence of reward. Because animals are never trained to lever press during the CS, the PIT effect (i.e. the elevation in lever pressing observed during the CS) is generally assumed to reflect the motivational properties of that cue, rather than its ability to elicit an explicitly learned response (e.g. a stimulus-response habit or conditioned reflex).
An often-overlooked feature of PIT is that strong rewardpredictive cues are not effective motivators of instrumental behavior. Conventional PIT protocols [9][10][11][12] were designed to maximize the response-invigorating influence of the CS by using relatively long duration cues that are loosely paired with unpredictable reward delivery (e.g. reward delivery based on a random time (RT) 30-s schedule during 2-min CS presentations). In contrast, short cues that reliably signal imminent reward have little effect and may even suppress instrumental behavior [12][13][14][15].
Such findings do not readily fit with prevailing theories, which assume that the motivational properties of cues are directly defined by their predictive value [16][17][18]. But behavioral theorists have long recognized that distinguishing between motivation and prediction makes sense from an ethological perspective [19][20][21]. As noted by Mackintosh [22]: 'it is hard to see why an animal should be expected to engage in more vigorous instrumental activity in the presence of stimuli which, it has learned, signal the imminent delivery of food . . . animals that were active in the presence, and inactive in the absence, of food would be unlikely to survive long enough to serve as subjects for study by psychologists.' Thus, when a desired resource is scarce, it is adaptive to engage in exploratory reward seeking, but, when that resource is expected soon, one should try to secure and consume it while it is still available. Of course, this creates a dilemma: Pursuing new reward opportunities can prevent one from exploiting an expected reward, and vice versa.

Motivation versus reward expectancy
This dilemma is useful for understanding cue-motivated behavior and its apparently inverse relationship with reward expectancy. In PIT, animals seek out reward by lever pressing but must leave the lever and approach the food port to retrieve an expected reward. The degree to which a CS invigorates lever pressing is driven by its motivational properties but should also be negatively constrained by reward expectancy.
A similar distinction is made in the Pavlovian conditioned approach (PCA) literature [23]. When a localizable cue (e.g. lever insertion) predicts reward delivery, animals either approach and interact with the cue itself (signtracking) or approach the food port (goal-tracking). Partially reinforced cues are known to elicit higher levels of signtracking than continuously reinforced cues [24 ,25,26], which is notable because sign-tracking is another wellestablished assay of incentive motivation [8], likely reflecting an inborn conditioned foraging response that serves a similar ethological function to instrumental reward seeking. Thus, per PIT and PCA studies, weak cues are more effective at instigating motivated behavior.
This raises a fundamental question about the nature of cue-motivated behavior. Are weak cues more motivating than strong cues because they are assigned a higher motivational value, or are they simply less likely to trigger a reward expectancy and any associated behavior? The recently introduced incentive hope theory [27 ,28] takes up the motivational hypothesis, positing that reward uncertainty directly amplifies the motivational value of rewardpredictive cues, similar to an increase in physiological need (e.g. hunger, thirst). This boost in motivation is assumed to be greatest when rewards are completely unpredictable and drops off for other cues based on how reliably they signal either the presence or absence of reward. Incentive hope accounts for the finding [24 ,25] that cues signaling uncertain but relatively large rewards (2 pellets; p = 0.5) elicit higher levels of sign-tracking than cues that reliably signal small rewards (1 pellet, p = 1), despite predicting the same average amount of reward. This theory assumes a further boost in motivation for cues signaling uncertainty in both reward probability and magnitude (1-3 pellets, p = 0.5), in line with observed elevations in sign-tracking [24 ,25].
Such findings are intriguing and have inspired new research investigating the possibility that reward uncertainty sensitizes the motivational system to promote pathological reward-seeking behaviors like gambling and drug abuse [29-31]. However, important questions remain unanswered. By focusing exclusively on cue conditions that signal either maximal reward certainty ( p = 1) or uncertainty ( p = 0.5), it is not clear whether differences in sign-tracking are associated with this variable or another confound, like expected reward probability. Moreover, while unreliable cues promote sign-tracking over goal-tracking, it is unclear if this reflects heightened motivation or diminished reward expectancy.

Reward probability, timing, and magnitude
To address such questions, we recently initiated a series of PIT studies investigating how cue-evoked reward expectations influence the balance between lever pressing and food-port approach (i.e. entries) ( Figure 1). For instance, we found that cues signaling low probabilities of reward ( ps = 0.1 and 0.3) invigorate pressing without eliciting much food-port approach, whereas cues signaling high probabilities of reward ( ps = 0.5, 0.7, and 0.9) have little effect on pressing but strongly elicit food-port approach [32] (Figure 1a,d). Thus, PIT varies (inversely) with reward probability, not reward uncertainty, in a manner that seems to reflect conflict between reward seeking and retrieval.
We have also probed the effects of uncertainty in reward timing on PIT [33 ] (Figure 1b,e). All rats in this study received pairings between three 30-s cues and food reward. One CS ended with delivery of 3 food pellets (FT-30s, 3 pellets). For a second CS, the number of pellets varied across trials (0-6) but were always delivered at the end of the CS (FT-30s, 0-6 pellets), whereas for a third CS, individual pellets were delivered at random times during that cue (RT-10s, 1 pellet). At test, the RT-10s CS, which signaled that reward delivery could occur at any moment, elicited an immediate and sustained elevation in food-port entries without increasing lever pressing. In contrast, the FT-30s CSs elicited lever pressing at cue onset, when reward was not expected, followed by increased food-port activity near the end of the cue, when reward was expected. This pattern was not influenced by uncertainty in reward magnitude. Thus, rats flexibly shifted from a seeking mode to a retrieval mode based on the expected time of reward delivery, though this response profile varies with task design [34,35 ,36,37]. A similar influence of cue-reward interval affects expression

Current Opinion in Behavioral Sciences
Pavlovian-instrumental transfer (PIT) test performance varies with reward expectancy. The tendency for a conditioned stimulus (CS) to elicit lever pressing is low when there is a high probability of reward (a) when it is expected soon (b) and when it is expected to be large in magnitude (c). These low levels of lever pressing are generally associated with high levels of food-port entry, suggesting that reward expectancy exerts opposing effects on these conflicting reward seeking (press) and reward retrieval (approach) activities (d)-(f). (g) Schematic illustration showing the relationship between motivation and reward expectancy in PIT. Reward-paired cues appear to activate distinct motivational and cognitive processes. Such cues generally elicit a motivational impulse to seek out new rewards through instrumental activity, but may also trigger a cognitive reward expectancy encoding information about reward timing, probability and magnitude. We suggest that cognitive control processes are engaged when reward is expected soon in order to suppress unnecessary reward seeking and facilitate efficient reward retrieval. (h) Data from the above studies are re-plotted to show how response bias measure (presses/(presses + approaches)) varies with reward expectancy. Reward expectancy for fixed-time (FT) schedules was computed using uniform cumulative probability density functions from 0 to t s, adjusted for reward magnitude and probability, partitioned into three 10-s bins (see Panels of PCA, with long cues eliciting sign-tracking and short cues eliciting goal-tracking [19]. Interestingly, elevations in sign-tracking caused by reward uncertainty appear to be specific to short, reward-proximal cues [38 ,39 ], which may reflect a reduction in reward expectancy and not a boost in motivation. While uncertainty in reward magnitude did not alter PIT performance in this study, expectations about reward magnitude do appear to play an important role. In a recent study [40] (Figure 1c,f), we trained rats with two distinct 30-s cues, each reliably signaling a fixed amount of reward at cue offset. For one group, both cues signaled 3 pellets, but for the remaining groups, one cue signaled 3 pellets and the alternative cue signaled either a smaller (1 pellet) or larger (9 pellets) reward. At test, only the cue signaling the small reward increased lever pressing. Instead, cues for medium or large reward tended to reduce press rates, particularly at the end of the CS when reward delivery was expected. Once again, changes in lever pressing mirrored opposing changes in food-port entries.
Thus, animals use expectations about the probability, timing, and magnitude of reward to regulate their motivational responses to reward-paired cues. This conclusion is also bolstered by evidence that even while strongly predictive cues acquire motivational properties, their ability to motivate reward seeking is normally suppressed to facilitate reward retrieval. For instance, it is possible to unmask this motivational influence by extinguishing the conditioned food-port approach response before PIT testing [6]. Moreover, while cues that signal imminent food delivery at a specific location normally fail to motivate instrumental reward seeking, cues predicting immediate delivery of an intra-oral sucrose reward do become effective motivators [41,42], presumably because they do not elicit a competing retrieval response. It is also worth noting that reward-paired cues can motivate instrumental reward-seeking actions independently of the specific identity [43,44] or current value [45] of the expected outcome, indicating that their motivational properties are general in nature and divorced from cognitive reward predictions.
Resolving conflict with cognitive control Konorski [21] outlined an early framework for understanding how reward expectations regulate motivation. He argued that while essentially all reward-paired cues activate the brain's motivational system, cues that are temporally contiguous with reward are uniquely able to activate a detailed cognitive representation of reward. The latter, he proposed, allows strong cues to simultaneously trigger reward-specific consummatory responses (e.g. salivating, orienting toward reward delivery site) and inhibit the motivational system to facilitate such behavior.
More recent accounts [46,47] posit that reward-predictive cues engage dissociable motivational and cognitive control systems. The motivational system allows cues to trigger impulsive reward seeking, whereas the cognitive control system uses contextual information -including reward expectancy -to regulate the motivational system, suppressing mindless or otherwise disadvantageous behavior. This framework is often used to account for biases in sign-tracking versus goal-tracking behavior [46,47] but can also be applied to understand how reward expectations shape expression of PIT 3 [32,33 ] (see Figure 1g). It has also been argued that cognitive control underlies the tendency for reward-predictive cues to bias action selection to promote the pursuit of specific rewards [48], an effect termed outcome-specific PIT.
This framework also provides a structure for interpreting recent findings on the neural bases of cue-motivated behavior. Cues appear to exert their motivational influence by activating the mesolimbic dopamine system, whose function is required for expression of sign-tracking and PIT but not conditioned food-cup approach [10,[49][50][51]. Mesolimbic dopamine responses to cues vary with their ability to motivate reward seeking [52][53][54] and do not signal reward expectancy per se [55 ,56]. Dopamine receptor blockade disrupts the response-invigorating influence of reward-paired cues but not their ability to bias action selection toward a specific reward [57]. In contrast, cognitive control over cue-motivated behavior depends on the medial prefrontal cortex and closely associated structures. Prefrontal cholinergic activity is crucial for relevant functions such as attention [46], and animals biased toward sign-tracking exhibit impaired attention and attenuated prefrontal cholinergic signaling [58]. The paraventricular thalamus has also emerged as key player in cognitive control [59] and works in concert with the medial prefrontal cortex to regulate balance between sign-and goal-tracking, apparently by modulating mesolimbic dopamine release [60].

Losing control
The notion that motivational impulses are suppressed via cognitive control based on cue-dependent reward expectancies fits nicely with our finding that rats flexibly shift from reward seeking to retrieval as reward anticipation increases (Figure 1). This approach is also useful for studying how cognitive control becomes dysregulated to produce maladaptive behavior (Figure 2). For instance, in addiction, the pathological pursuit of drugs may stem from an excessive urge to use drugs [3] and/or an inability to suppress these urges [2]. Previous studies have shown Modulation of Pavlovian-instrumental transfer (PIT) by reward expectancy is disrupted during adolescence and following repeated cocaine exposure.
(a) Rats with a history of repeated cocaine exposure showed higher levels of cue-motivated lever pressing than vehicle-treated controls, and unlike controls failed to adaptively regulate such behavior based on expected reward timing. (b) Similarly, adolescent rats showed a higher levels of cue-motivated lever pressing than adult controls, and failed to adjust their performance based on expected reward probability. These developmental and drug-induced alterations in cue-evoked lever pressing were generally associated with opposing changes in food-port approach behavior (c),(d). Thus, both cocaine-exposed (e) and adolescent (f) rats exhibit an impaired ability to adaptive regulate their reward seeking and retrieval activity according to reward expectancy (computed as in Figure 1h that rats given repeated exposure to drugs like amphetamine and cocaine exhibit heightened levels of PIT [54,[61][62][63] and sign-tracking [64,65]. While such effects could be driven by an overactive (sensitized) motivational system, they may also reflect a failure to regulate motivation based on reward expectancy. If it is the latter, then the effect of drug pre-exposure should be more apparent when cues elicit a strong reward expectancy.
To test this, we assessed how cocaine-exposed rats use expectations about reward timing during PIT [33 ] (Figure 2a,c). Drug-naïve control rats showed the typical within-cue shift from pressing to food-cup approach, consistent with increasing reward expectancy. In contrast, cocaine-exposed rats increased -rather than decreased -their rate of lever pressing over time during cue presentations, which interfered with their ability to check the food cup. Thus, while cocaine-exposed rats expected reward at the end of the cue, they lacked control over the urge to simultaneously seek out new rewards (Figure 2e). Similarly, it was recently shown that repeated amphetamine exposure enhances sign-tracking and decreases goal-tracking but only for cues that reliably signal reward [38 ]. The lack of effect for cues signaling uncertain reward suggests to us a failure of cognitive control and not an overactive motivational system. It is also notable that the ability to select actions based on outcome-specific reward expectations is impaired in rats pre-exposed to amphetamine [66] or a long-term junk-food diet [67], which may reflect a similar cognitive control deficit.
We have also probed changes in cue-motivated behavior during adolescence (Figure 2b,d). This period of life is strongly associated with risky behaviors (e.g. vaping, binge drinking, unprotected sex). It is believed that such activities may be the product of an age-dependent imbalance between an overactive motivational system and a developing but ultimately inadequate cognitive control system [68][69][70]. We tested this by comparing adolescent and adult male rats on a probabilistic version of PIT [32]. Normal adult rats showed clear sensitivity to reward expectancy, increasing their press rate during a cue that signaled a low probability of reward, but withholding this behavior to approach the food cup during a higher-probability cue. Adolescent rats, in contrast, showed elevated lever pressing to both low-probability and high-probability cues, indicating an inability to adaptively control motivational impulses when they are counterproductive (Figure 2f).

Summary
We have reviewed evidence that reward expectancies play an important role in regulating cue-motivated behavior. Animals are motivated to seek out reward when presented with reward-paired cues but withhold such behavior when imminent reward is expected, an influence modulated by reward probability, timing, and magnitude.
These findings reveal the flexible nature of cue-motivated behavior but also highlight a challenge in modeling such behavior in rodents. Even our most selective assays of conditioned motivation (i.e. PIT and sign-tracking) seem to engage cognitive control processes that regulate exploratory reward seeking when cues signal imminent reward. Under such conditions, differences in cue-motivated behavior across individuals and/or groups may reflect differences either in the motivation to seek out new rewards or in the ability to exert adaptive control over such behavior. We suggest that it is possible to parse these motivational and cognitive control processes by including cue conditions that differ in their ability to evoke a strong reward expectancy. This approach may prove useful for elucidating the mechanisms that support conditioned motivation and cognitive control, as well as for developing biomarkers to investigate the distinct roles these two processes may play in addiction and other motivational disorders.
That said, more research is also needed to further validate this conceptual framework. For instance, it is possible that strong reward-predictive cues fail to invigorate (and sometimes suppress) instrumental lever pressing because they are simply more potent triggers of the competing food-port approach response, and not because they engage specialized cognitive control mechanisms to provide behavioral flexibility based on detailed reward expectancies. One way to test the cognitive control account is by assessing the effects of post-training reward devaluation, which should attenuate the suppressive influence of reward-predictive cues on lever pressing if this effect is, indeed, mediated by a cognitive representation of the expected reward.

Conflict of interest statement
Nothing declared.

27.
Anselme P: Incentive salience attribution under reward uncertainty: a Pavlovian model. Behav Processes 2015, 111:6-18. This paper presents a bold model of appetitive Pavlovian conditioning which provides a formal account for the facilitatory effects of partial reinforcement on sign-tracking. A key assumption is that reward uncertainty boosts the attribution of incentive value to reward-paired cues. This theory allows for clear, testable predictions for an important but poorly understood feature of appetitive behavior. We have outlined an opposing account in the current review emphasizing the suppressive influence of reward expectancy on motivated behavior.

33.
Marshall AT, Ostlund SB: Repeated cocaine exposure dysregulates cognitive control over cue-evoked rewardseeking behavior during Pavlovian-to-instrumental transfer. Learn Mem 2018, 25:399-409. This paper reviews previous research demonstrating that repeated psychostimulant exposure enhances expression of cue-motivated behavior (focusing on PIT) and identifies an important alternative account involving the loss of cognitive control. New findings and a reanalysis of archival data reveal that the ability to flexibly regulate reward-seeking behavior based on expected reward timing is impaired in cocaine-exposed rats.

35.
Matell MS, Della Valle RB: Temporal specificity in Pavlovian-toinstrumental transfer. Learn Mem 2018, 25:8-20. This paper explores the role of cue-reward interval timing in the expression of cue-motivated lever pressing in PIT. Cues used in this study signaled relatively long delays (60 or 120-s) between cue onset and reward delivery, in contrast to the shorter delays (30-s or less) used in Ref. [33]. Findings varied with stimulus modality (visual versus auditory) but suggest that cues signaling long delays in reward delivery tend to have delayed motivational effect on instrumental reward seeking.