The rodent lateral orbitofrontal cortex represents expected Pavlovian outcome value but not identity

The orbitofrontal cortex (OFC) is critical for updating reward-directed behaviours flexibly when task contingencies are reversed, or when outcomes are devalued. We systematically examined the generality of these findings using lesions of the rodent lateral OFC (LO) in instrumental action-outcome, and Pavlovian cue-outcome, learning using specific satiety and taste aversion methods of outcome devaluation. LO lesions disrupted outcome devaluation in Pavlovian but not instrumental procedures. Furthermore, this effect was only observed when using taste-aversion devaluation. Using a specific Pavlovian-to-Instrumental transfer procedure, we established that LO is not necessary for the representation of specific outcome properties, but rather in using these properties to access the current motivational value of outcomes. The role of LO in outcome devaluation and reversal learning was also dissociable between anterior and posterior subregions. These novel dissociable task- and subregion-specific effects suggest a way to reconcile contradictory findings between rodent and non-human primate OFC research.


Introduction 23
The orbitofrontal cortex (OFC) in rodents and primates is critical for updating 24 behaviour flexibly when outcome contingencies change (Murray, O'Doherty, & 25 Schoenbaum, 2007). Compelling evidence for this view comes from studies using outcome 26 devaluation procedures in which the value of a reward is reduced to test whether behaviour is 27 updated to reflect changes in the outcome's current value. In rodents, OFC lesions disrupt the 28 appropriate reduction in anticipatory responding for a reward that has been paired with illness 29 and has become aversive (Gallagher, McMahan, & Schoenbaum, 1999 Takahashi, Schoenbaum, & Niv, 2014) argue that deficits in outcome devaluation following 41 OFC lesions are due to an inability to access the representation of the specific identity of 42 expected outcomes. However, deficits in outcome devaluation could also be due to an 43 inability to use an intact representation of expected outcome identity to access its current 44 motivational value. 45 One aspect of devaluation procedures that may dissociate these two mechanisms 46 (expected identity vs. expected value) is which method of outcome devaluation is used. The 3 two most commonly employed methods of outcome devaluation are taste aversion, i.e. 48 pairing the reward with illness so that the reward becomes aversive, and sensory specific 49 satiety, i.e. consuming the specific reward to satiety so that it is no longer very rewarding 50 (Holland & Straub, 1979). Devaluation by sensory specific satiety involves recent access to 51 the specific outcome immediately prior to test. This recent repeated access to the outcome 52 may lead to habituation of the sensory representation (identity) of the outcome and provides 53 recent experience of the current motivational value of the outcome. In contrast, taste aversion 54 methods may force the organism to successfully recall outcome identity at test without the aid 55 of recent outcome exposure (Colwill & Rescorla, 1985;Holland, 2004). 56 Model-based and sensory-specific outcome-expectancy coding accounts of the OFC 57 (Delamater, 2007;Rudebeck & Murray, 2014;Wilson et al., 2014) predict that OFC lesions 58 should disrupt the devaluation effect regardless of whether specific satiety or taste aversion 59 methods are used. This is because appropriate flexible behaviour in both methods of 60 devaluation require that an animal first access the model-based state-representation/identity 61 of the expected outcome. If the OFC is only necessary for accessing the current motivational 62 value of expected outcomes but not their identities, then OFC lesions may not disrupt 63 outcome devaluation by specific satiety. We directly test this prediction using excitotoxic 64 lesions of the OFC in both satiety and taste aversion devaluation procedures. the test (Devaluation x Cue ANOVA, all F's < 1, p's > .51), suggesting that the effects on 241 lever pressing were not confounded by differences in competing magazine responding to the 242 cues. 243 These findings suggest that one associative pathway that might contribute to 244 behavioural control in Pavlovian devaluation tasks using specific satiety is habituation or a 245 reduction in the signalling efficacy of the sensory specific properties of expected outcomes. 246 In contrast, devaluation using taste aversion leaves the signalling properties of expected 247 outcomes intact (Holland, 2004;Rescorla, 1992). Given that lesions of the rodent OFC 248 disrupt devaluation by taste aversion Pickens et al., 2003Pickens et al., , 2005, the 249 intact devaluation we observe following specific satiety in OFC lesioned animals can be 250 accounted for by this alternative pathway. Specifically, OFC lesions disrupt the use of 251 specific outcome properties to access the current motivational value of an expected outcome 252 (as required by taste aversion devaluation), but do not disrupt the representation of the 253 sensory specific properties of expected outcomes per se. In fact, pre-training OFC lesions do 254 not disrupt specific Pavlovian to instrumental transfer (Ostlund & Balleine, 2007), an effect 255 we have confirmed with our lesion and behavioural parameters (supplementary Figure S1). 256

Pavlovian devaluation by taste aversion 257
An alternative account of the absence of OFC lesion effects on Pavlovian devaluation 258 using sensory specific satiety in contrast to robust deficits devaluation by taste aversion in 259 rodents Pickens et al., 2003Pickens et al., , 2005 is the extent and specificity of 260 OFC lesion damage. It is notable that OFC lesions in these taste aversion studies encompass 261 many orbital subregions (VO, LO, DLO, AI, and even MO). In contrast, the OFC lesions in 262 the present studies are predominantly focussed on the anterior extent of LO, similar to those 263 employed by Ostlund and Balleine (2007). In addition to testing whether these anterior LO 12 lesions are sufficient to replicate the effect of large OFC lesions on outcome devaluation by 265 taste aversion, a second group of lesion animals was created with posterior LO lesions. Rats 266 underwent sham or excitotoxic lesion surgery using a range of co-ordinates, and two distinct 267 lesion groups were established (described in methods section), anterior and posterior LO 268 lesion groups ( Figure 4A, Figure S2) were defined by damage predominantly anterior or 269 posterior to bregma +3.70 respectively ( Figure 4B). 270 First, all animals were trained on two unique Pavlovian cue-outcome relationships. 271 Acquisition of responding to the CSs predicting the to-be devalued and non-devalued USs did 272 not differ within groups but differed between lesion groups ( Figure 4C) such that responding 273 was lower in the posterior OFC lesion group. A mixed Group x CS (devalued, non-devalued) 274 x DayBlock (4 Blocks of 3 days) ANOVA supported this observation with a significant main 275 effect of Group (F(2, 41) = 3.67, p = .03) and DayBlock (F(3, 123) = 102.14, p < .001) but all 276 other effects failed to reach significance (Group x US F(2, 41) = 2.55, p = .09, Group x US x 277 DayBlock F(6, 123) = 2.01, p = .07, all remaining F < 1.00, p > .44). Bonferroni corrected 278 pairwise comparisons of overall responding revealed that the posterior group had lower 279 performance than the sham group (F(1, 41) = 7.34, p = .03), but no significant differences were 280 found between anterior and sham (F(1, 41) = 2.29, p = .42), or posterior and anterior groups 281 (F(1, 41) = 1.65, p = .62). 282 Taste aversion was successfully acquired by all groups ( Figure 4D). Food 283 consumption (g) was analysed using a Group x Pairing (injection 1, 2) x Devaluation (LiCl, 284 saline) ANOVA which revealed significant effects of Devaluation (F(1, 41) = 8.23, p = .01), 285 Pairing (F(1, 41) = 141.39, p < .001) and a Pairing x Devaluation interaction (F(1, 41) = 37.83, p 286 < .001), but no main effect or interactions with Group (all remaining F < 1.00, p > .68). 287 Follow up simple effects revealed that consumption of the US paired with LiCl did not differ 288 from saline prior to the first injection (pairing 1 F(1, 41) = 0.09, p = .77), but was significantly 289 13 reduced relative to saline prior to the second injection (pairing 2 F(1, 41) = 59.199, p < .001). 290 The third injection pairings performed in the test chambers showed successful transfer of the 291 taste aversion to this context in all groups ( Figure 4E). A Group x Devaluation mixed 292 ANOVA on magazine duration behaviour revealed a significant effect of Devaluation (F(1, 41) 293 = 16.16, p < .001) that did differ with Group (all remaining F < 1.57, p > .22). Taken 294 together, consumption and approach towards the US paired with LiCl was successfully 295 reduced compared to the US paired with saline injections, but the magnitude of this unique 296 taste aversion did differ between groups. 297 Devaluation testing was conducted under extinction to ensure that behaviour was 298 guided by the expected/recalled value of the outcomes ( Figure 4F). The sham group showed a 299 significant reduction in magazine behaviour to the CS that predicted the devalued relative to 300 the non-devalued US, but this devaluation effect was not evident in the anterior and posterior 301 lesion groups. This pattern of results was supported by a Group x Devaluation mixed 302 ANOVA revealing a significant Group x Devaluation interaction (F(2, 41) = 3.46, p = .04), the 303 main effects of Devaluation (F(1, 41) = 3.74, p = .06) and Group (F(2, 41) = 0.41, p = .41) did 304 not reach significance. Simple effects revealed that this interaction was due to a significant 305 devaluation effect in the sham group (F(1, 41) = 7.33, p = .01), but not the anterior (F(1, 41) = 306 2.06, p = .16) or posterior groups (F(1, 41) = 0.81, p = .37). This suggests that lesions of the 307 anterior or the posterior LO are sufficient to disrupt Pavlovian devaluation by taste aversion, 308 previously established with much larger OFC lesions in rodents 309 Pickens et al., 2003309 Pickens et al., , 2005. 310 Next, a US specific reinstatement test was conducted to see if the lesion groups could 311 appropriately reduce behaviour to the devalued cue following a brief reminder of the outcome 312 value. Rats were first exposed to one of the USs, and after a short delay they were presented 313 with the CS that predicted that US (in extinction). All groups remained sensitive to the taste 314 14 aversion when re-exposed to the USs in the test chamber (uneaten devalued USs observed by 315 experimenter when cleaning the chamber prior to test). A mixed Group x Period (pre, post 316 US delivery) x Devaluation ANOVA on magazine behaviour during US re-exposure (data not 317 shown) revealed a significant effect of Period (F(1, 41) = 71.20, p < .001), Devaluation (F(1, 41) 318 = 72.05, p < .001) and Period x Devaluation interaction (F(1, 41) = 79.30, p < .001, all 319 remaining F < 1.33, p > .28). Simple main effects revealed that magazine behaviour did not 320 differ before US delivery (F(1, 41) = 1.23, p = .02), but was significantly higher after delivery 321 of the non-devalued than the devalued US (F(1, 41) = 93.46, p < .001). Devaluation interaction (F(2, 41) = 1.97, p = .15). Therefore, re-exposure to the US prior to test 327 elicited a robust devaluation effect in all groups. This suggests that the disruption of the 328 Pavlovian devaluation effect following LO lesions is not caused by a failure to acquire 329 sensory specific cue-outcome associations, not the ability to acquire a sensory specific taste-330 aversion, nor perseverative responding to any predictive cues. Instead, the deficit is specific 331 to recalling the new value of the devalued outcome and/or integrating it into appropriate 332 behavioural control. 333

Sign-tracking and reversal 334
The finding that posterior LO lesions retarded acquisition of initial Pavlovian 335 conditioned approach behaviour is surprising given that these animals can appropriately 336 modulate their cue driven behaviour based on outcome value when given contact with the US 337 in a reinstatement test. It was hypothesised that this might reflect an impairment in the 338 15 attribution of value/salience to the Pavlovian cue itself. When a lever is used as a Pavlovian 339 cue, rats will come to approach and engage with the lever cue (sign-tracking) instead of the 340 normal conditioned approach to the magazine (goal-tracking behaviour) ( argued that sign-tracking behaviour reflects a process by which the lever CS acquires 343 enhanced incentive salience so that the incentive motivational value of the outcome becomes 344 attributed to the cue (Berridge, 2004). Therefore, it was predicted that the posterior LO group 345 would not attribute incentive salience to a lever cue and show a deficit in sign-tracking. The 346 sham, anterior, and posterior LO lesion groups were retrained on a discriminated sign-347 tracking procedure in using rewarded (CS+) and non-rewarded (CS-) lever cues (left and 348 right lever, counterbalanced). 349 To ensure that any differences in lever pressing are not confounded by differential 350 levels of competing responses, it is important to establish that there are no group differences 351 in baseline magazine behaviour. Mixed Group x DayBlock (4 blocks of 3 days) ANOVAs for 352 the PreCS magazine duration did not differ between groups during acquisition (Group or 353 Group x DayBlock interactions, all F < 1.75, p > .12) or subsequent reversal (all F < 2.01, p 354 > .14, data not shown). 355 During acquisition, lever pressing during the CS+ was greater than CS-, but the lesion 356 groups made fewer responses than the sham group ( Figure 5A The difference scores on the standardised variate revealed that during reversal of the 488 rewarded lever, all groups responded more towards the magazine than the lever at the start of 489 training ( Figure 5D). However, by the end of training the sham and anterior groups were 490 performing more to the lever than the magazine whereas the posterior group was performing 491 equally to both the magazine and the lever. Our results demonstrate a number of important neural and behavioural dissociations 502 within the rodent OFC. First we directly confirm the dissociable role of the rodent OFC in 503 Pavlovian but not instrumental behavioural flexibility following outcome devaluation 504 Ostlund & Balleine, 2007). Next we demonstrate that OFC lesions in 505 rodents only disrupt the Pavlovian outcome devaluation effect when outcome value is 506 manipulated by taste aversion but not specific satiety. Using a specific PIT test, we establish 507 that, unlike taste aversion devaluation, specific satiety devaluation can act via a reduction in 508 the efficacy of sensory specific outcome properties, a reduction which appears to be intact 509 following OFC lesions. Finally, we show that the role of the OFC in outcome devaluation and 510 reversal learning are dissociable within anterior and posterior subregions of rodent LO. 511 Together, these findings allow for many contradictory findings in OFC research to be 512 reconciled as functional heterogeneity within the putative orbital subregions. are likely to achieve this by recruiting different psychological processes. Therefore, to 535 understand the functional role of the OFC is it important to consider the differences between 536 these devaluation procedures. 537 To selectively reduce responding for a devalued outcome, an organism must have access 538 to the specific sensory properties of the predicted outcome (i.e. identity information), as well 539 as the current motivational value of the predicted outcome (i.e. value information). It has 540 been shown that taste aversion modifies the value of a predicted outcome, but leaves intact 541 the access to the sensory specific properties of the predicted outcome. For example, rats will 542 significantly enhance instrumental lever responding in the presence of a Pavlovian cue if both 543 the response and cue reliably predict the same outcome (i.e. the specific PIT effect), an effect 544 that is mediated by the specific properties of the predicted outcome acting as a sensory cue 545 and that is unaffected by taste aversion learning (Colwill & Rescorla, 1990; Gilroy et al., 546 2014; Holland, 2004;Rescorla, 1992). 547

24
The specific satiety procedure involves repeated consumption of a specific outcome until 548 the animal reaches satiety, a procedure that simultaneously reduces the motivational value of 549 the outcome and potentially involves habituation of the sensory systems required to represent 550 the sensory properties of the outcome. We hypothesised that, if specific satiety involves the 551 habituation of the sensory specific properties of an expected outcome, then its signalling 552 properties should be greatly reduced and unable to mediate specific PIT. We confirmed this 553 prediction in a behavioural experiment. Following Pavlovian and instrumental training for 554 distinct outcomes, Pavlovian cues selectively enhanced instrumental responding for the same 555 outcome, but not when the outcome had been devalued by specific satiety. Therefore, unlike 556 taste aversion devaluation (Holland, 2004;Rescorla, 1992), specific satiety devaluation 557 reduces the effective signalling capacity of predicted sensory specific outcome properties, 558 and impairs specific PIT. 559 The specific OFC lesion deficit in Pavlovian outcome devaluation following taste 560 aversion, but not sensory specific satiety, suggests that OFC lesions do not disrupt the 561 representation of sensory specific of predicted outcomes. This is supported by intact 562 performance on specific PIT tasks following OFC lesions ( Figure S1) (Ostlund & Balleine, 563 2007), a task that critically depends on using the signalling properties of predicted outcomes 564 to guide behaviour i.e. the sensory properties of the expected outcome (S) can potentiate the 565 instrumental lever response (R) by an S-R association. Therefore, following OFC lesions, 566 rodents can successfully represent and utilise the specific properties of predicted outcomes to 567 guide behaviour which allows them to flexibly update behaviour in Pavlovian devaluation by 568 specific satiety and specific PIT procedures. However, OFC lesions disrupt flexible 569 behaviour when the task requires the use of these intact specific properties of predicted 570 outcomes to access the current motivational value of the outcome, as in Pavlovian 571 devaluation procedures using taste aversion. 572

Sign-tracking 573
A consistently reported finding following OFC lesions is that initial acquisition and 574 behavioural expression of either cue-outcome or action-outcome contingencies is left intact 575 acquisition. It is unlikely that this effect is simply a general suppression of activity (as there 583 was no difference in locomotor activity, Figure S3) or appetite (as there was no difference in 584 consumption levels at the start of taste aversion learning). 585 One possible account of the reduced Pavlovian conditioned approach behaviour in the 586 posterior LO group is that the CS did not acquire incentive salience. Incentive salience refers 587 to the process by which a the incentive-motivational properties of the outcome are transferred 588 to the CS (Berridge, 2004), such that if a lever CS is presented a rat will attempt to 589 "consume" the lever as if it were the pellet that it predicts. This behaviour directed at the 590 lever CS (sign-tracking) comes at the expense of the traditional Pavlovian approach response 591 to the site of reward delivery, the magazine (goal-tracking). Sham control and anterior LO 592 lesions did not affect the propensity to acquire sign-tracking behaviour, whereas sign-tracking 593 was significantly reduced following posterior LO lesions. This finding is consistent with 594 evidence that rats showing stronger sign-tracking tendencies have increased c-fos activity in 595 posterior OFC regions following lever cue presentation (Flagel et al., 2011). This suggests 596 that the posterior but not the anterior LO mediates the attribution of incentive-salience to 597 26 cues. Alternatively, posterior LO may be involved in resolving response competition when 598 multiple responses are supported by a predictive cue. In the present experiment, the sign-599 tracking procedure was preceded by extensive Pavlovian training during the outcome 600 devaluation procedure, which may have resulted in a pre-existing dominant magazine 601 approach response that could not be overcome following posterior LO lesions. 602 Surprisingly, extensive LO lesions have also been shown to have no effect on sign-603 tracking behaviour (Chang, 2014), but did retard subsequent reversal learning when rewarded 604

Methods 698
Animals. Rats were housed four per cage in ventilated Plexiglass cages in a 699 temperature regulated (22 ± 1°C) and light regulated (12h light/dark cycle, lights on at 7:00 700 AM) colony room. At least one week prior to behavioural testing, feeding was restricted to 701 ensure that weight was approximately 95% of ad libitum feeding weight, and never dropped 702 below 85%. All animal research was carried out in accordance with the National Institute of Apparatus. Behavioural testing was conducted in eight identical operant chambers 714 (30.5 x 32.5 x 29.5 cm; Med Associates) individually housed within ventilated sound 715 attenuating cabinets. Each chamber was fitted with a 3-W house light that was centrally 716 located at the top of the left-hand wall. Food pellets could be delivered into a recessed 717 magazine, centrally located at the bottom of the right-hand wall. Delivery of up to two 718 separate liquid rewards via rubber tubing into the magazine was achieved using peristaltic 719 pumps located above the testing chamber. The top of the magazine contained a white LED 720 light that could serve as a visual stimulus. Access to the magazine was measured by infrared 721 detectors at the mouth of the recess. Two retractable levers were located on either side of the 722 32 magazine on the right-hand wall. A speaker located to the right of the house light could 723 provide auditory stimuli to the chamber. In addition, a 5-Hz train of clicks produced by a 724 heavy-duty relay placed outside the chamber at the back right corner of the cabinet was used 725 as an auditory stimulus. The chambers were wiped down with ethanol (80% v/v) between 726 each session. A computer equipped with Med-PC software (Med Associates Inc., St. Albans, 727 VT, USA) was used to control the experimental procedures and record data. 728 Devaluation chambers. To provide individual access to reinforcers during the 729 devaluation procedure, rats were individually placed into a mouse cage (33 x 18 x 14 cm 730 clear Perspex cage with a wireframe top). Pellet reinforcers were presented in small glass 731 ramekins inside the box and liquid reinforcers were presented in water bottles with a sipper 732 tube. 1 day prior to the start of the devaluation period, all rats were exposed to the mouse 733 cages and given 30 mins of free access to home cage food and water to reduce novelty to the 734 context and consuming from the ramekin and water bottles. were pre-exposed to the reinforcers (10 g of pellets per animal and 25 ml of liquid reinforcer 787 per animal) in their home cage. 788 Magazine training. In all experiments, animals received two sessions of magazine 789 training, one for each reinforcer with the following parameters: reward delivery was on an 790 RT60 s schedule for 16 rewards with the house light and fan kept on throughout the session. 791 Sessions were separated by at least 2 hours. 792

Experiment 1. Instrumental Devaluation by LiCl Taste Aversion 793
All animals received 2 separate sessions of training each day with the pellet and sucrose 794 rewards, an instrumental lever training session (lever extended) and a magazine training 795 (lever retracted) session with non-contingent reward delivery to provide equivalent exposure 796 to the alternative reward. The order of training sessions and the identity of the instrumental 797 and alternate reward were fully counterbalanced across all groups. All training session were 798 separated by a period of at least 2 hours. 799 First, animals were familiarised with lever training using a fixed ratio 1 schedule (FR1, 800 reward delivered on each lever press), for 60 mins or until a maximum of 25 rewards were 801 earned. The alternative, non-instrumental, reward was delivered on an RT30s (random time 802 30s) schedule for 1 hour or until 25 rewards had been delivered. 803 Instrumental acquisition training occurred on the following 3 days. Instrumental training 804 sessions lasted until 40 rewards were achieved and lever pressing was rewarded on a RI30s 805 schedule (random interval 30s such that on average every 30s a reward becomes available to 806 reward the next lever press). The alternate reward session involved an RT30s schedule for 40 807 rewards. The use of interval and time based schedules of reinforcement was designed to 808 match the instrumental and alternate reward sessions so that all experiences were identical 809 except for the presence (and response requirement) of the lever in the instrumental session. 810 Following devaluation of the reward by taste aversion, all animals were tested with the 811 instrumental lever to assess devaluation. The test was conducted under extinction and the 812 lever was extended for 10 mins. On the following day, all animals were given a 20-min re-813 acquisition test to assess devaluation in the presence of the instrumental reinforcer (RI30s 814 schedule). 815

Taste Aversion 816
Following instrumental training all animals received taste aversion training on one of the 817 reinforcers. Half the animals in each surgery condition (sham and lesion) were allocated to a 818