Acetylcholine is released in the basolateral amygdala in response to predictors of reward and enhances learning of cue-reward contingency

The basolateral amygdala (BLA) is critical for associating initially neutral cues with appetitive and aversive stimuli and receives dense neuromodulatory acetylcholine (ACh) projections. We measured BLA ACh signaling and principal neuron activity in mice during cue-reward learning using a fluorescent ACh sensor and calcium indicators. We found that ACh levels and activity of nucleus basalis of Meynert (NBM) cholinergic terminals in the BLA (NBM-BLA) increased sharply in response to reward-related events and shifted as mice learned the tone-reward contingency. BLA principal neuron activity followed reward retrieval and moved to the reward-predictive tone after task acquisition. Optical stimulation of cholinergic NBM-BLA terminal fibers during cue-reward learning led to more rapid learning of the cue-reward contingency. These results indicate that BLA ACh signaling carries important information about salient events in cue-reward learning and provides a framework for understanding how ACh signaling contributes to shaping BLA responses to emotional stimuli.


Introduction 45
Learning how environmental stimuli predict the availability of food and other natural 46 rewards is critical for survival. The basolateral amygdala (BLA) is a brain area necessary for 47 associating cues with both positive and negative valence outcomes (Baxter & Murray, 2002;48 Janak & Tye, 2015;LeDoux et al., 1990). Recent work has shown that genetically distinct 49 subsets of BLA principal neurons encode the appetitive and aversive value of stimuli (J. Kim et 50 al., 2016). This encoding involves the interplay between principal neurons, interneurons, and 51 incoming terminal fibers, all of which need to be tightly regulated to function efficiently. 52 The neuromodulator acetylcholine (ACh) is released throughout the brain and can 53 control neuronal activity via a wide range of mechanisms. ACh signals through two families of 54 receptors (nicotinic, nAChRs and muscarinic, mAChRs) that are differentially expressed on BLA 55 neurons as well as their afferents (Picciotto et al., 2012). ACh signals through these receptors to 56 increase signal-to-noise ratios and modify synaptic transmission and plasticity in circuits 57 involved in learning new contingencies (Picciotto et al., 2012), especially in areas that receive 58 dense cholinergic input, like the BLA (Woolf, 1991;Zaborszky et al., 2012). 59 The basal forebrain complex is a primary source of ACh input to the BLA. In particular, 60 the nucleus basalis of Meynert (NBM) sends dense cholinergic projections to the BLA (Woolf,61 13 BLA cholinergic terminal activation ( Fig. S4.4D, two-way repeated-measures ANOVA, F (1, 9) = 266 0.05804, p = 0.8150.) Finally, to determine whether there was any effect of NBM-BLA 267 cholinergic terminal stimulation on preference for, or avoidance of, a stressful environment, mice 268 were tested for changes in time spent in the dark or light side due to laser stimulation in the 269 Light/Dark Box test, and there were no differences between the groups (Fig. S4.4E-F these mice maintaining high levels of incorrect nose pokes for the duration of Training 291 compared to Saline and Mec treated mice (Fig. 5C + Fig. S5.1B, pink shading, main effect of 292 Group (antagonist) in a two-way repeated-measures ANOVA, F (3, 30) = 25.64, p < 0.0001). 293 Saline and Mec groups were not significantly different in any phase of the task, including across 294 Extinction ( Fig. 5B-C + Fig. S5.1A-B, orange shading, main effect of Group (antagonist) in a 295 two-way repeated-measures ANOVA, F (1, 15) = 1.201, p = 0.2903). Consistent with the 296 inability to acquire the cue-reward contingency, mice treated with Scop or Mec+Scop also 297 obtained very few rewards during Extinction (Fig. 5B + Fig. S5.1A, orange shading). The 298 antagonists had no effect on locomotion as measured by beam breaks (Fig. S5.1C Yakel, 2011). It is therefore possible that ACh signaling may result in intracellular signaling 306 changes that outlast the cue presentation window. In order to determine if the effect of NBM-307 BLA stimulation is dependent upon the timing of correct nose poke and laser stimulation 308 contingency, we repeated the experiment in an independent cohort of mice with an additional 309 yoked, non-contingent ChR2 group that received the same number of stimulation trains as the 310 contingent ChR2 group, but in which light stimulation was explicitly unpaired with task events 311 ( Fig. 6A + Fig. S6.1). As in the previous experiment, there were no differences between the 312 EYFP control (n = 6) and stimulation groups (contingent ChR2 n = 5 and Yoked non-contingent 313 ChR2 n = 5) during Pre-Training ( Fig. 6B-C + Fig. S6.2 A- both ChR2 groups were significantly better than the EYFP control group (Fig. 6B-C + Fig. S6 locked to the cue, nose poke, or reward retrieval to improve performance of the task, suggesting 326 that ACh may alter the threshold for neuronal plasticity for cue-reward pairing over a much 327 longer timescale than might be expected based on results from the ACh3.0 recording and NBM-328 BLA recordings, which could be consistent with the involvement of mAChR signaling in this 329 effect. As in the previous experiment, once all groups reached criterion for acquisition of the 330 cue-reward contingency, there were no differences between any of the groups during Extinction 331 It is increasingly recognized that the BLA is involved in learning to predict both positive 336 and negative outcomes from previously neutral cues (Cador et al., 1989;Janak & Tye, 2015;337 LeDoux et al., 1990). Cholinergic cells in the basal forebrain complex fire in response to both 338 positive and negative reinforcement (Hangya et al., 2015). The results shown here indicate that 339 ACh signaling in the BLA is intimately involved in cue-reward learning. Endogenous ACh is 340 released in the BLA in response to salient events in the task, and ACh dynamics evolved as the 341 subject formed associations between stimuli and reward. While the pattern of ACh signaling in 342 the BLA may seem reminiscent of how dopamine neurons encode reward prediction errors as 343 measured in other brain areas (Schultz et al., 1997), the current results suggest that ACh 344 release in the BLA may instead be involved in signaling a combination of salience and novelty. 345 ACh release and NBM-BLA activity increased following correct nose poke and, around the time 346 that animals acquired the cue-reward task, following tone onset. However, earlier in training, 347 incorrect nose pokes that resulted in a timeout were also followed by ACh release, although this 348 was lower in magnitude. Further, stimulating NBM-BLA cholinergic terminals during learning 349 enhanced behavioral performance, but was not intrinsically rewarding on its own and did not 350 support responding for the tone alone. Although ACh was released in the BLA at discrete points 351 during the task, the effects of heightened BLA ACh signaling were relatively long lasting, since it 352 was not necessary for stimulation to be time-locked to cue presentation or reward retrieval to 353 enhance behavioral performance. Thus, cholinergic inputs from the basal forebrain complex to 354 the BLA are a key component of the circuitry that links salient events to previously neutral 355 stimuli in the environment and uses those neutral cues to predict future rewarded outcomes. 356 357 BLA ACh signaling and principal cell activity are related to cue-reward learning 358 We have shown that ACh release in the BLA is coincident with the stimulus that was 359 most salient to the animal at each phase of the task. Use of the fluorescent ACh sensor was 360 essential in determining these dynamics. Previous microdialysis studies have shown that ACh is 361 released in response to positive, negative, or surprising stimuli, but this technique is limited by 362 relatively long timescales (minutes) and cannot be used to determine when cholinergic 363 transients align to given events in an appetitive learning task and how they evolve over time 364 (Sarter & Lustig, 2020). In this cue-reward learning paradigm, when there was no consequence 365 for incorrect nose-poking (Pre-Training phase), animals learned to perform a very high number 366 of nose pokes and received a large number of rewards, and BLA ACh signaling peaked 367 following correct nose pokes. Both the behavioral response (nose poking that was not 368 contingent with the tone) and the ACh response (linked to the correct nose poke) suggest that 369 the animals were not attending to the tone during the Pre-Training phase of the task, but rather 370 were attending to the cues associated with reward delivery, such as the reward light or the 371 sound of the pump that delivered the reward. Consistent with this possibility, in the next phase 372 of the task when mice received a timeout for responding if the tone was not presented, 373 performance of all groups dropped dramatically. Interestingly, in the early Training sessions, 374 ACh release shifted to reward retrieval, likely because this was the most salient aspect of the 375 task when the majority of nose pokes performed did not result in reward. Finally, as mice 376 acquired the contingency between tone and reward availability, the tone also began to elicit ACh 377 release in the BLA, suggesting that mice learned that the tone is a salient event predicting 378 reward availability. Since there are multiple sources of ACh input to the BLA, it was important to 379 determine whether NBM cholinergic neurons were active during the periods when ACh levels 380 were high (Woolf, 1991). Recordings from cholinergic NBM-BLA terminal fibers showed similar 381 dynamics to ACh measurements, suggesting that the NBM is a primary source of ACh across 382 the phases of cue-reward learning. 383 Perhaps the most well-known example of dynamic responding related to learning cue-384 reward contingencies and encoding of reward prediction errors is the firing of dopaminergic 385 neurons of the ventral tegmental area (VTA; Schultz, 1998). After sufficient pairings, 386 dopaminergic neurons will fire in response to the cue that predicts the reward, and no longer to 387 the rewarding outcome, which corresponds with behavioral changes that indicate an association 388 has been formed between conditioned stimuli (CS) and unconditioned stimuli (US). Plasticity 389 related to learning has also been observed in cholinergic neurons in the basal forebrain complex 390 during aversive trace conditioning, such that after several training days, neuronal activity spans 391 the delay between CS and US (Guo et al., 2019). Additionally, a recent study suggested that 392 ACh may signal a valence-free reinforcement prediction error (Sturgill et al., 2020). Future 393 studies on the selective inputs to NBM to BLA cholinergic neurons would be of interest to 394 identify the links between brain areas involved in prediction error coding. 395 We found that BLA principal cells were most reliably activated following reward retrieval 396 before contingency acquisition (both when they were receiving several rewards but no timeouts 397 in Pre-Training and few rewards early in Training). Similar to the recording of ACh levels, after 398 acquisition, the tone began to elicit an increase in BLA principal cell population activity. 399 However, activity of principal neurons differed from ACh signaling in the BLA in important ways. 400 ACh was released in response to the salient events in the task that were best able to predict 401 reward delivery or availability. In contrast, the activity of BLA principal neurons was not tightly 402 time-locked to correct nose poking, and instead followed reward retrieval until acquisition, when 403 activity increased in response to tone onset. The divergent dynamics of ACh release and 404 principal neuron activity underscores that ACh's role in the BLA is to modulate, rather than 405 drive, the activity of principal neurons, and therefore may alter dynamics of the network through 406 selective engagement of different populations of GABA interneurons (Unal et al., 2015). 407 408

Increasing BLA acetylcholine levels enhances cue-reward learning 409
Neuronal activity and plasticity in the BLA is required for both acquisition of appetitive 410 learning (conditioned reinforcement) and fear conditioning, however the inputs that increase 411 activity in the structure during salient events likely come from many brain areas ( inputs to the BLA are important for acquisition of conditioned reinforcement and for linking the 414 rewarding properties of addictive drugs to cues that predict their availability (Cador et al., 1989). 415 Our results indicate that ACh is a critical neuromodulator upstream of the BLA that is responsive 416 to salient events, such as reward availability, motor actions that elicit reward, and cues that 417 predict reward. We show here that increasing endogenous ACh signaling in the BLA caused 418 mice to perform significantly better than controls in an appetitive cued-learning task. Heightened 419 ACh release during learning of a cue-action-reward contingency led to fewer incorrect 420 responses and increased acquisition rate in both female and male mice. The optical stimulation 421 was triggered by correct nose poke, thus the cholinergic NBM-BLA terminal fiber stimulation 422 overlapped with all three salient events: tone, nose poke, and reward retrieval, since the tone 423 terminated 2 sec after correct nose poke. Therefore, the initial optical stimulation of ACh release 424 coincided with the tone and correct nose poke from the beginning of training in ChR2 mice, 425 approximating the ACh signature in mice that had already acquired the cue-reward contingency. 426 We hypothesize that it was this premature increase in ACh levels at the time of cue presentation 427 that was important in allowing the animals to learn the contingency earlier. It is possible that ACh increased learning by increasing the intensity of the reward, 429 potentiating the learned association, improving discrimination, or a combination of these 430 phenomena. However, increasing ACh release in the BLA was not inherently rewarding, 431 because it did not support self-stimulation or real-time place preference. This is at odds with a 432 recent study that found stimulation of NBM-BLA cholinergic terminals could induce a type of  Cell-type-specific expression of AChRs and activity-dependent effects place cholinergic 451 signaling at a prime position to shape BLA activity during learning. For instance, late-firing 452 interneurons in the BLA exhibit nAChR-dependent EPSP's when no effect is seen on fast-

Surgical procedures 498
Surgical procedures for behavior were performed in fully adult mice at 4-6 months of 499 age, age-matched across conditions. For viral infusion and fiber implantation, mice were 500 anesthetized using isoflurane (induced at 4%, maintained at 1.5-2%) and secured in a 501 stereotactic apparatus (David Kopf Instruments, Tujunga, CA). The skull was exposed using a 502 scalpel and Bregma was determined using the syringe needle tip (2 µL Hamilton Neuros 503 syringe, 30 gauge needle, flat tip; Reno, NV). 504 For fiber photometry surgeries, either 0.4 µL of AAV9 hSyn-ACh3.0 (Vigene Biosciences 505 Inc.) to measure BLA ACh levels ( Fig. 2A-E + S2.1-S2.2) or 0.5 µL of AAV1 Syn-FLEX-506 GCaMP6s-WPRE-SV40 (Addgene, Watertown, MA) to measure BLA principal cell calcium 507 dynamics (Fig. 3 + S3.1-S3.2 Mice were allowed to recover in a cage without bedding with a microwavable heating 524 pad underneath it until recovery before being returned to home cage. For two days following 525 surgery, mice received 5 mg/Kg Rimadyl i.p (Zoetis Inc., Kalamazoo, MI) as postoperative care. 526 For optical stimulation experiments (Fig. 4,6 + Fig. S4.1-S4.4 + S6.1-S6.2) For ex vivo electrophysiology experiments (Fig. 4B), the NBM was injected with DIO-536 ChR2-EYFP as described above, except mice were 8 weeks of age. The coronal brain slices 537 containing the NBM were prepared after 2-4 weeks of expression. Briefly, mice were 538 anesthetized with 1X Fatal-Plus (Vortech Pharmaceuticals, Dearborn, MI) and were perfused 539 through their circulatory systems to cool down the brain with an ice-cold (4°C) and oxygenated 540 cutting solution containing (mM): sucrose 220, KCl 2.5, NaH2PO4 1.23, NaHCO3 26, CaCl21, 541 MgCl2 6 and glucose 10 (pH 7.3 with NaOH). Mice were then decapitated with a guillotine 542 immediately; the brain was removed and immersed in the ice-cold (4°C) and oxygenated cutting 543 solution to trim to a small tissue block containing the NBM. Coronal slices (300 µm thick) were 544 attenuating chambers allowed the patch cord to pass through. BLA ACh3.0 ( Fig. 2A-E) and 591 principal cell GCaMP6s (Fig. 3) fiber photometry recordings occurred in a darkened behavioral 592 room outside of sound attenuating chambers due to steric constraints with rigid fiber photometry 593 patch cords. Later behavioral chamber customization allowed NBM-BLA terminal fiber (Fig. 2F-594 J) and jRCaMP1b/ACh3.0 (Fig. S2.5) mice to be tested inside sound attenuating chambers. For 595 fiber photometry experiments, a custom receptacle was 3D printed that extended the cup 596 beyond the chamber wall to allow mice to retrieve the reward with more rigid patch cords. In 597 addition, the modular test chamber lid was removed and the wall height was extended with 3D 598 printed and laser cut acrylic panels to prevent escape. Each mouse was pseudo-randomly 599 assigned to behavioral chamber when multiple chambers were used, counterbalancing for 600 groups across boxes. there was no consequence for improper nose pokes, neither in the active port outside the tone 619 (incorrect nose pokes) nor in the inactive port (inactive nose pokes). The number of inactive 620 nose pokes were typically very low after shaping and were not included in analysis. After reward 621 retrieval (receptacle entry following reward delivery) the receptacle light was turned off and the 622 tone was presented again on a variable intertrial interval schedule with an average interval of 30 623 sec (VI 30), ranging from 10 to 50 sec (Ambroggi et al., 2008). After 4-5 days of tone training, 624 mice progressed to the Training phase, which had the same contingency as Pre-Training except 625 incorrect nose pokes resulted in a 5 sec timeout signaled by house light illumination, followed by 626 a restarting of the previous intertrial interval. Extinction was identical to Training except no 627 Ensure was delivered in response to correct nose pokes. In order to promote task acquisition, 628 mice that were not increasing number of rewards earned reliably were moved to a VI 20 629 schedule after 9 days of VI 30 Training for BLA ACh3.0 or 6-7 days for BLA principal cell mice. 630 The VI 20 schedule was only needed for the two groups that were trained outside of the sound 631 attenuating chambers. and F0, divided by F0, which was multiplied by 100 to yield % ΔF/F0. The % ΔF/F0 was 689 calculated independently for both the signal (465 nm) and reference (405 nm) channels to 690 assess the degree of movement artifact. Since little movement artifact was observed in the 691 recordings ( Fig. S2.1B-C, S2.3E-F, S3.1C-D, tan lines), the signal % ΔF/F0 was analyzed 692 alone. The % ΔF/F0 was z-scored to give the final Z % ΔF/F0 reported here. For the BLA 693 principal cell recordings (Fig. S3.1C-D), some mirroring of the signal channel observed in the 694 reference channel. This is likely because 405 nm is not the "true" isosbestic point for GCaMP 695 and we were instead measuring some changes in calcium-unbound GCaMP rather than added for alignment, meaning that no trials for that day had a latency that stretched the entire 707 window. Only rewarded trials where the mouse entered the receptacle within 5 sec after nose 708 poke were analyzed. Full or partial training days were excluded from analysis if there were 709 acquisition issues such as the patch cord losing contact with the fiber or behavioral apparatus 710 malfunction. Lack of trials for analysis or recording issues led to missing rows of fiber 711 photometry data in the heatmap despite having behavioral data, in which case these rows were 712 skipped rather than adding entire blank rows. Due to individual differences in behavior, across-713 mouse average data was calculated by using a selection of days in which behavior was roughly 714 similar or milestones such as first and last day of Pre-Training, first day earning 10 rewards in 715 Training, first day crossing acquisition threshold (and maintaining afterward), last day of 716 Training, last day of Extinction (with 4 or more rewarded trials that met analysis criteria). 717 Additional days were included in across-mouse average heatmaps when possible. Incorrect 718 nose poke heatmaps were generated by averaging signals for 5 sec before and 5 sec after 719 incorrect nose pokes that were not preceded by an incorrect nose poke in the last 5 sec. The 720 incorrect nose poke heatmaps averaged across mice were generated using the same selection 721 of days as the combined action heatmaps for a given experiment. 722

Locomotor Activity 792
Optical Stimulation: Mice were placed in a square box (47 cm x 47 cm x 21 cm) for 20 min 793 with a floor of filter paper that was changed between mice. During the 3 rd 5 min bin of the 794 session, mice received optical stimulation (20 sec on/off, 20 Hz, 25 ms pulses). Locomotor 795 activity was recorded via overhead camera and analyzed in 5 min bins with EthoVision. 796 Antagonists: Locomotor data was collected using an Accuscan Instruments (Columbus, Ohio) 797 behavior monitoring system and software. Mice were individually tested in empty cages, with 798 bedding and nesting material removed to prevent obstruction of infrared beams. Mice were 799 injected (i.p.) with saline, mecamylamine (1 mg/kg, Sigma), scopolamine (0.5 mg/kg, Sigma), or 800 mecamylamine+scopolamine (1 mg/kg and 0.5 mg/kg, respectively) 30 min before locomotor 801 testing. Locomotion was monitored for 20 min using 13 photocells placed 4 cm apart to obtain 802 an ambulatory activity count, consisting of the number of beam breaks recorded during a period 803 of ambulatory activity (linear motion rather than quick, repetitive beam breaks associated with 804 behaviors such as scratching and grooming). 805

Light/Dark Box Exploration 806
A rectangular box was divided evenly into a light (clear top, illuminated by an 8W tube 807 light) and dark (black walls, black top) side with a black walled divider in the middle with a small 808 door. The lid and divider were modified to allow the optical fiber and patch cord to pass through 809 freely. Mice were placed facing the corner on the light side furthest from the divider and the 810 latency to crossing to the dark side was measured. The number of crosses and time spent on 811 each side were measured for 6 min following the initial cross.