Two-Photon Imaging of Striatum Demonstrates Distinct Functions for Striosomes and Matrix in Reinforcement Learning

Despite the discovery of striosomes several decades ago, technical difficulties have hampered the study of their functions. Here we used 2-photon calcium imaging in neuronal birthdate-labeled Mash1-CreER mice to image simultaneously the activity of striosomal and matrix neurons in vivo. We report that with this method we can visually identify circumscribed zones of neuropil that correspond to striosomes as verified in immunostained sections. We find that striosomal neurons, relative to matrix neurons, preferentially encode reward-predicting cues, and that their activity contains more information about expected outcome. These characteristics emerge during training and further strengthen during overtraining. Both striatal compartments are active similarly after reward delivery, firing at neuron-specific times during or after consummatory licking. Finally, we find that immediate reward history strongly modulates neuronal activation in the next trial, especially in matrix neurons. These results suggest that striosomes and matrix have distinct functions in relation to reinforcement learning.


34
The striatum, despite its relatively homogeneous appearance in simple cell stains, is made up of 35 a mosaic of macroscopic zones, the striosomes and matrix, which differ in their input and 36 output connections and are thought to allow specialized processing by physically modular 37 groupings of striatal neurons (Crittenden et al., 2016;Fujiyama et al., 2011;Gerfen, 1984;38 Graybiel and Ragsdale, 1978;Jiménez-Castellanos and Graybiel, 1989 modules are the striosomes (also called patches), which are distinct from the surrounding 42 matrix by their differential expression of neurotransmitters, receptors and many other gene 43 expression patterns (Banghart et al., 2015;Cragg, 2015, 2017;Crittenden 44 and Graybiel, 2011; Cui et al., 2014;Gerfen, 1992;Graybiel, 2010;Graybiel and Ragsdale, 1978). 45 Striosomes in the anterior striatum have strong inputs from regions related to the limbic 46 system, including parts of the orbitofrontal and medial prefrontal cortex (Eblen and Graybiel, 47 5 2012; Watabe-Uchida et al., 2017;Watabe-Uchida et al., 2012). By imaging day by day during 106 the acquisition and overtraining periods of the task, we asked whether these patterns changed 107 in systematic ways with experience. Finally, we tested the effect of reward history on the 108 activity patterns of current trials, given reports that strong reward-history activity has been 109 found in sites considered to be directly or indirectly connected with striosomes (Bromberg-110 Martin et al., 2010;Hamid et al., 2016;Tai et al., 2012). 111 We demonstrate that neurons within visually identified striosomes encode reward-112 predicting tones more strongly than do those of the surrounding matrix, but that matrix 113 neurons are more strongly modulated by reward history, especially during extended post-114 reward periods. These activity patterns develop during learning with different dynamics for cue 115 responses and post-reward responses. These findings suggest that neurons in striosomes and 116 matrix can be differentially tuned by reinforcement contingencies both during learning and 117 during subsequent performance. Finally, this work opens the opportunity for future functional 118 understanding of striosome-matrix architecture by 2-photon microscopy and selective tagging 119 of neurons with known developmental origins, an opportunity that will be valuable in the study 120 of both normal animals and those representing models of disease states. 121

122
To detect striosomes, we performed experiments in Mash1-CreER X Ai14 mice injected with 123 tamoxifen at E11.5. This pulse labeling method (Kelly et al., 2017) resulted in strong tdTomato 124 labeling of clusters of SPNs and surrounding regions of neuropil in the striatum (Figure 1). In 125 initial experiments, we confirmed that these clusters corresponded to striosomes by the close 126 match between the zones of tdTomato neuropil labeling and mu-opioid receptor 1 (MOR1)-rich 127 zones observed in immunohistologically prepared sections (Tajima and Fukuda, 2013). We also 128 observed some tdTomato-labeled cells outside of MOR1-labeled striosomes, scattered sparsely 129 in the extra-striosomal matrix. 130 For in-vivo experiments, we used 2-photon microscopy to image the striatum of 5 131 striosome-labeled mice that had received unilateral intrastriatal injections of AAV5-hSyn-132 GCaMP6s and had been implanted with cannula windows and a headplate ( Figure 2). Each 133 mouse was trained on a classical conditioning task in which 2 auditory tones (1.5-s duration 134 each) were associated with reward delivery by different probabilities (tone 1, 80% vs tone 2, 135 20%) (Figure 2A). Inter-trial intervals were 7 ± 1.75 s. With training, mice began to lick in 136 anticipation of the reward, and the amount of this anticipatory licking became greater when 137 cued by the tone indicating a high probability (80%) of reward ( Figure 2B). We calculated a 138 learning criterion based on the anticipatory lick rates during the two cues and the subsequent 139 delay period (0.5 s). Mice exhibiting a divergence in anticipatory licking for the two cues for at 140 least two out of three consecutive sessions were considered as trained ( Figure 2C). We 141 performed imaging during training (task acquisition) and after this criterion had been reached 142 (trained, Figure 2D). 143 144

Imaging of striosomes 145
Clusters of tdTomato-positive neurons were clearly visible in vivo in the 2-photon microscope at 146 40x magnification, and the neuropil of these neurons delimited zones in which many dendritic 147 processes could be identified (Figure 3). We simultaneously recorded striosomal and matrix 148 neurons from fields of view with clear striosomes. In all animals, we could see at least two 149 different striosomes from which we imaged at least five different non-overlapping fields of 7 view. In some instances, we could see two different striosomes in one field of view. In the 151 entire data set, we imaged 1867 neurons in striosomes and 4453 in the matrix. Because 152 striosomes form parts of extended branched labyrinths, it was possible to follow some 153 striosomes through ±100 µm in depth, and across ± 800 µm in the field of view. During training, 154 we rotated through the fields of view, but after training criterion had been reached, we 155 recorded activity in unique non-overlapping fields of view (2704 neurons, of which 727 were in 156 striosomes; between 252 -782 neurons per mouse). 157 158 Striatal neurons exhibit heightened activity during different task epochs 159 As an initial approach to our data, we analyzed the overall fluorescence for every session in 160 trained animals by averaging the frame-wide fluorescence ( Figure 4A). Both cues evoked a large 161 response in the neuropil signal, which was larger for the high-probability cue. After reward 162 delivery, there was a prolonged, strong activation that peaked around 5 s after reward delivery. 163 To determine more precisely the nature of this activation, we aligned neuronal responses in the 164 rewarded trials to the tone onset, to the first lick after reward delivery and to the end of the 165 licking bout ( Figure 4B). This analysis demonstrated that, in addition to the tone response, there 166 was an increase in signal after the start of the consummatory licking period, and this signal 167 increased over time and peaked at the time of the last lick, and then subsided. 168 Next, we analyzed single-cell activity to investigate the neural dynamics of task encoding 169 by the striatal neurons. In particular, we asked whether the prolonged activation seen in the 170 frame-wide fluorescence signal was also visible in single cells, or whether individual neurons 171 were active during different task events. Neuronal firing was sparse during the task, but we 172 found that individual neurons were active during particular events in the task ( Figure 4D,G). For 173 instance, the red color-coded cell illustrated in Figure 4C and D became active soon after tone 174 onset, whereas the neuron color-coded in gray fired during the post-reward licking period. The 175 timing of their activities with respect to specific trial events seemed relatively stable, 176 resembling what has been reported before for neurons in the striatum of behaving rodents by 177 recording and analyzing spike activity (Bakhurin et al., 2017;Barnes et al., 2011;Gage et al., 178 2010;Jog et al., 1999;Rueda-Orozco and Robbe, 2015). To determine task encoding by single 8 neurons at a population level, we defined task-modulated neurons as those that were active 180 during any epoch of the task (see Materials and methods). Overall, 38.2% of the striatal 181 neurons imaged in our samples were task-modulated. Of these, most were active during only 182 one of the three task epochs (85%). Among task-modulated neurons, most were selectively 183 active during the post-reward licking period (57%), but substantial numbers of neurons were 184 also active during the tone presentation (17%) or after the licking had stopped (11%, Figure  185  To dissociate the specific contributions of striosomes and matrix to task encoding, we again first 202 investigated aggregate neuronal responses in both striatal compartments. We drew region of 203 interests (ROIs) around striosomes and around regions of the matrix with similar overall 204 intensity of fluorescence and size, and then compared the total amount of fluorescence from 205 these regions in trained animals. This analysis demonstrated a stronger tone-evoked activation 206 in striosomes than in the nearby matrix regions sampled ( Figure 5A) (ANOVA main effect p < 207 0.001). Moreover, the high-probability tone cue evoked a larger response than the low-208 matrix neurons during these epochs, we analyzed population-averaged activity aligned to 238 different task events ( Figure 5G-J). As in our neuropil analyses, we found that individual 239 striosomal neurons were more robustly active than individual matrix neurons during the cue 240 epoch of the task ( Figure 5G, ANOVA main effect p < 0.001). Moreover, the high-probability 241 tone elicited a higher response than the low-probability tone (p < 0.001). 242 We used an AUROC analysis to compare activity in trials that were rewarded (aligned to 243 first lick after reward delivery) or unrewarded (aligned to 2 s after cue onset, a time period 244 matching that for the rewarded trial analysis). We found that striosomal neurons were more 245 selective for rewarded trials ( Figure 5H, p < 0.001). Interestingly, the selectivity for reward was 246 greater for low-probability than for high-probability tone trials (p < 0.01). While cells in the two 247 compartments responded similarly during post-reward licking ( Figure 5I, p > 0.05), striosomes 248 had a higher response during the post-licking period ( Figure 5J, p < 0.01). Together, these 249 findings demonstrate that striosomes are more task-modulated in this appetitive classical 250 conditioning task than the nearby matrix and that they are particularly more active during 251 reward-predicting cues. To determine how these responses were shaped by training, we analyzed striatal activity during 255 the acquisition period of the task. To quantify levels of learning, we tested for significance in 256 the difference between anticipatory licking for high-and low-probability cues during the tone 257 presentation and the reward delay. If mice exhibited a significant difference on 2 out of 3 258 consecutive days, we considered them as being trained. Sessions performed before this 259 criterion was met were categorized as acquisition sessions. This categorization allowed us to 260 ask whether the strong striosomal cue-related response was a sensory feature, or whether it 261 was an acquired response related to the meaning of the stimulus. Activity measures for the 262 neuropil signals, comparing signals for all sessions before the mice reached the learning 263 criterion with signals of all the sessions afterwards, showed that the response to the tones in 264 striosomes was much stronger after animals learned the task ( Figure 6A). The neuropil signal in 265 striosomes was significantly higher after the task performance reached the training criterion (p 266 < 0.05). This effect was not observed in the matrix (p > 0.05; ANOVA interaction p < 0.001). 267 Single-cell analysis further indicated that during training, the percentage of task-modulated 268 neurons increased steadily ( Figure 6B), and that when mice reached the learning criterion 269 (sessions 11-12 for the mice shown in Figure 6B,C), there was a rapid increase in the proportion 270 of cue-modulated neurons ( Figure 6C). 271 We further tested whether there was a sudden step-like increase in striosomal tone 272 signaling. We averaged the z-scores of the activity of all task-modulated neurons during the last 273 5 sessions before and during the first 5 sessions after the learning criterion had been reached 274 ( Figure 6D). Averaging the activity in these two groups of sessions indicated a clear increase in 275 striosomal signaling during the tone ( Figure 6E). This increase was significant ( Figure 6F,ANOVA, 276 training main effect p < 0.001; interaction p < 0.05). In addition to this development of a tone 277 response later in training, there was a tone-related activation in the sessions in which the 278 animals were first exposed to the task, perhaps reflecting a startle or novelty signal effect. This 279 tone-related activation disappeared after 1-3 sessions and reemerged later as mice learned the 280 task. Analysis of neuronal activity during the post-reward licking period showed that during 281 training there was an increase in the percentage of neurons that responded during this period 282 ( Figure 6G). Averaging the activity of all task-modulated neurons during training showed that 283 there was an increase of activity in the period after reward delivery ( Figure 6H). In contrast to 284 the increases in tone response, this reward-period increase occurred several sessions before 285 mice learned the task. 286 287 During overtraining, striosomal tone-related responses intensify and become more selective 288 for high-probability tones 289 To investigate further the relationship between neuronal responses and learning, two mice 290 were trained for an additional five sessions. In these sessions, we imaged again the same fields 291 of view that were recorded after these two mice reached training criterion. These last 5 292 sessions were considered overtraining sessions. The tone-evoked aggregate response became 293 notably higher and sharper during this phase ( Figure 7A). By contrast, the calcium signals that 294 occurred immediately after reward delivery declined, resembling previously reported task-295 bracketing patterns (Barnes et al., 2005;Jin and Costa, 2010;Jog et al., 1999;Smith and 296 Graybiel, 2013;Thorn et al., 2010). The increase in responses related to the tone during 297 overtraining was particularly strong in striosomes. In the period following reward delivery, the 298 signal initially dropped compared to the earlier sessions but subsequently reached the same 299 magnitude. We quantified the peak response during the tone presentation period for the 300 training, post-training and overtraining sessions ( Figure 7B), comparing the responses of 301 striosomal and matrix samples, and found a highly significant interaction (ANOVA interaction p 302 < 0.005). In the trained and overtrained mice, striosomes had significantly higher tone-evoked 303 responses than did the matrix (paired t-test, trained mice p < 0.01 and overtrained mice p < 304 0.05). The striosomal neuropil responses also became more selective for the high-probability 305 cue during overtraining ( Figure 7C), so that during overtraining the striosomal response was 306 significantly larger than the matrix response (paired t-test p < 0.05). 307 We next analyzed the number of task-modulated neurons during the overtraining 308 period. This percentage grew with training, then slightly dropped again during overtraining ( At all stages, there were more striosomal than matrix task-modulated neurons (Fisher's exact 312 test, p < 0.01). By contrast, the percentage of cue-modulated cells ( Figure 7D, second panel) 313 grew further (striosomes: 4.1% during training, 15.0% after training and 21.1% during 314 overtraining; matrix: 2.3% during training, 8.1% after training and 13.9% during overtraining). 315 There were more tone-modulated neurons during acquisition, after training and during 316 overtraining (Fisher exact test, p < 0.05). The percentage of cells that were active during the 317 consummatory licking period ( Figure 7D, third panel) also increased (striosomes: 7.0% during 318 training, 28.9% after training and 14.9% during overtraining; matrix: 5.6% during training, 27.0% 319 after training and 13.3% during overtraining), but there was no difference between striosomes 320 and matrix (Fisher's exact test, p > 0.05). The percentage of cells that were active after the end 321 of licking remained stable during overtraining ( Figure 7D, fourth panel; striosomes: 1.9% during 322 training, 5.6% after training and 6.6% during overtraining; matrix: 1.1% during training, 7.3% 323 after training and 6.8% during overtraining), with no differences between striosomes and 324 matrix at any training stage (Fisher's exact test, p > 0.05). At no stage during training was the 325 percentage of neurons that were significantly modulated during the response period or after 326 the last lick different between striosomal and matrix neurons (Fisher exact test, p > 0.05). The 327 limited number of significantly modulated neurons in these two mice was too low to make 328 statistical comparisons of the neuronal responses. Nevertheless, the findings for the entire 329 performance period of the mice collectively demonstrate that the activity patterns observed 330 after training were acquired as mice learned the task, that the striosomal encoding of the tone 331 became stronger than that of the matrix, that this response emerged at the time the animals 332 began differentially responding to the tones, and that this response developed further during 333 overtraining, becoming larger and more selective for the high-probability cue. 334 335 Matrix responses are more sensitive to outcome history 336 In the classical conditioning task employed in this study, mice used the auditory tone presented 337 during the cue epoch to guide their expectation for receiving a reward on the current trial. We 338 examined their licking responses as a proxy for such expectation in order to ask whether, in 339 addition to the information provided by the cue, the mice used the outcome in the previous 340 trial to tailor their reward expectation in the current trial. In trials following rewarded trials, 341 mice showed increased anticipatory licking during the cue and reward delay ( Figure 8A; n = 33 342 sessions from five mice; p < 0.001, Wilcoxon signed-rank test), but licking during the post-343 reward period was unaffected by outcome on the previous trial (p > 0.05, Wilcoxon signed-rank 344 test). To determine whether the task-related activity of the striatal neurons in our sample was 345 also modulated by outcome history, we compared activity in trials preceded by a rewarded trial 346 or by an unrewarded trial, regardless of the cue type (high or low probability) presented on the 347 current trial. We first analyzed the effect of reward history on the cue-period responses of 348 single task-modulated neurons and found that activity was slightly greater when the previous 349 trial was rewarded (mean z-scores: 0.21 ± 0.01 vs. 0.17 ± 0.01 for previously rewarded and 350 unrewarded; p < 0.01). When we analyzed the effect of outcome history on neural responses 351 observed during post-reward licking in currently rewarded trials, we found that the activity of a 352 subset of striatal neurons was highly sensitive to outcome in previous trials ( Figure 8B). We 353 14 observed enhanced activity during post-reward licking when the previous trial was unrewarded, 354 compared to when the previous trial was rewarded. Similarly, population-averaged responses 355 of task-modulated neurons were significantly higher when the previous trial was unrewarded, 356 as compared to when it was rewarded (p < 0.001, Wilcoxon rank-sum test). Importantly, post-357 reward licking behavior was invariant to previous trial outcome, making it unlikely that the 358 observed changes in neural activity were related to changes in the motor output during reward 359 consumption. 360 To determine how far back in time we could detect an outcome history effect, we 361 computed a history modulation index (see Materials and methods) for currently rewarded trials 362 with two types of reward history. In the first group, we separated rewarded trials based on 363 whether the previous trial was rewarded or unrewarded (one trial back). For the second group, 364 we disregarded the outcome status in the immediately preceding trial and separated trials 365 depending on the outcome status of two trials in the past (two trials back). This analysis 366 showed that recent reward history has a stronger influence on post-reward licking responses of 367 task-modulated neurons than trials farther back in the past ( Figure 8C By the time when the animals had reached the learning criterion, striosomes, examined both by 394 averaged neuropil measures and by single-cell activities, were more responsive to the task than 395 the surrounding matrix neurons. The differential activation of striosomes was particularly 396 striking for the reward-predictive cues. More striosomal neurons were active in relation to the 397 cues, and this effect grew stronger as animals learned. The striosomal neurons also were more 398 selective for the high-probability cue. These responses did not reflect an overall greater 399 response of striosomes to all conditions; for example, their responses were less sensitive than 400 those of the matrix neurons to immediate reward history. 401 402 Outcome period activity 403 Over the task-related population, the highest activity levels for many of the neurons as the 404 learning criterion was reached occurred during the outcome period, whether the neurons were 405 in striosomes or in the surrounding matrix. During this period, overall neuronal activity built up 406 and peaked at the end of the licking. However, several factors pointed to this response as being 407 different from a pure motor response to the licking movements. Most strikingly, even among 408 the neurons strongly active during the prolonged licking period, the majority rose to their peak 409 activity at specific times within this period rather than during the entire licking period. These multiplexing of information about licking, reward history, timing with respect to task events, 420 and reward prediction. Importantly, we found the same stronger tone modulation in 421 striosomes when we analyzed neuropil activity. In these analyses, we obtained matched, 422 simultaneously registered striosomal and matrix data points from every session during the 423 same behavioral performance. For this reason, the differences that we observed in the activity 424 of the striatal compartments cannot be related to differences in licking behavior, as the 425 behaviors were identical. 426 427

Sensitivity to reward history 428
In contrast to these accentuated responses of striosomes, the striosomal neurons as a 429 population were less sensitive than those in the matrix to immediate reward history. When the 430 learning criterion had been reached, the neuronal responses for a given trial were elevated 431 when the previous trial was not rewarded. By contrast, anticipatory licking was decreased in 432 trials following unrewarded trials. These effects were significantly larger for the matrix. This 433 reward history effect did not occur for two-back reward history, suggesting that it reflected 434 immediate reward history. 18 differentially encode reward prediction error signals. One particular possibility is that 465 striosomes through their GABAergic innervation of dopamine-containing neurons could signal a 466 negative reward prediction signal. However, we found that striosomes preferentially encoded 467 reward-predictive cues. We did not find differences between striosomes and matrix in 468 outcome-related activity. We also did not find signals related to reward omissions in either 469 striosomes or matrix. We are aware that the dorsal striatum is heavily implicated in motor 470 behavior, through learning, action selection or perhaps the invigoration of action (Apicella et  Our findings are confined to the analysis of a very simple task, and they clearly are unlikely to 511 have uncovered the range of functions of the striosome and matrix compartments. Yet the 512 differences detected already suggest that striosomal neurons could be more responsive to the 513 immediate contingencies of events than nearby matrix neurons, that they could gain this 514 enhanced sensitivity by virtue of learning-related plasticity, but that they could be less sensitive 515 to immediately prior reward history. These attributes of the striosomes could be related to real-516 time direction of action plans based on real-time estimates of value. To our best knowledge, 517 this is the first report of simultaneous recording of visually identified striosome and matrix 518 compartments in the striatum, here made possible by the neuropil labeling in pulse-labeled 519 Mash1-CreER mice. 520

Virus injections 538
Adult Mash1(Ascl1)-CreER x Ai14 mice received virus injections during aseptic stereotaxic 539 surgery at 7-10 weeks of age. They were deeply anesthetized with 3% isoflurane, were then 540 head-fixed in a stereotaxic frame, and were maintained on anesthesia with 1-2% isoflurane. 541 Meloxicam (1 mg/kg) was subcutaneously administered, the surgical field was prepared and 542 cleaned with betadine and 70% ethanol, and based on pre-determined coordinates, the skin 543 was incised, the head was leveled to align bregma and lambda, two holes (ca. 0.5 mm diameter) 544 were drilled in the skull. Two injections of AAV5-hSyn-GCaMP6s-wpre-sv40 (0. 5 µl each, 545 University of Pennsylvania Vector Core) were made, one per skull opening, to favor widespread 546 transfections of striatal neurons at the following coordinates relative to bregma: 1) 0.1 mm 547 anterior, 1.9 mm lateral, 2.7 mm ventral and 2) 0.9 mm anterior, 1.7 mm lateral and 2.5 mm 548 ventral. Injections were made over 10 min, and after a ~10-min delay, the injection needles 549 were slowly retracted. The incision was sutured shut, the mice were kept warm with wet food 550 during post-surgical recovery, and they were given meloxicam (1 mg/kg, subcutaneous) for 3 551 days to provide analgesia. 552 553 Cannula implantation 554 We assembled chronic cannula windows by adhering a 2.7-mm glass coverslip to the end of a 555 stainless steel metal tubing (1.6-1.8 mm long, 2.7 mm diameter; Small Parts) using UV curable 556 glue (Norland). Cannula windows were kept in 70% ethanol until used for surgery. At 20-40 557 days after virus injection, mice were water restricted, and a second surgery was performed 558 under deep isoflurane anesthesia as before to allow insertion of a cannula for imaging 559 (Dombeck et al., 2010;Howe and Dombeck, 2016;Lovett-Barron et al., 2014) and mounting of a 560 headplate to the skull for later head fixation. Bregma and lambda were aligned in the horizontal 561 plane, and the anterior and lateral coordinates for the craniotomy were marked (0.6 mm 562 anterior and 2.1 mm lateral to bregma). The skull was then tilted and rolled by 5° to make the 563 skull surface horizontal at the location of cannula implantation. A 3 mm diameter craniotomy 564 was made with a trephine dental drill. The exposed cortical tissue overlying the striatum was 565 aspirated using gentle suction and constant perfusion with cooled, autoclaved 0.01 M 566 phosphate buffered saline (PBS), and part of the underlying white matter was removed. A thin 567 layer of Kwiksil (WPI) was applied, and the chronic cannula was inserted into the cavity. Finally, 568 metabond (Parkell) was used to secure the implant in place and to attach a headplate to the 569 skull. The mice received the same post-surgical care as described above. 570 571

Behavioral training 572
When mice had recovered from surgery and the optical window had cleared, they were water 573 deprived (1-1.5 ml per day) and habituated to head-fixation for on average 5 days. During head 574 fixation, the mice were held in a polyethylene tube that was suspended by springs. When they 575 showed no clear signs of stress and readily drank water while being head-fixed, behavioral 576 training was begun. Water was delivered through a tube controlled by a solenoid valve located 577 outside of the imaging setup, and licking at the spout was detected by a conductance-based 578 method (Slotnick, 2009). In the behavioral training protocol, 2 tones (4 or 11 kHz, 1.5-s 579 duration) were played in a random order. The tones predicted reward delivery (5 µl) with, 580 respectively, an 80% or 20% probability. In each trial, there was a 500-ms delay after tone offset 581 before reward delivery. Inter-trial intervals were randomly drawn from a flat distribution 582 between 5.25 and 8.75 s. Training was considered to be complete when there was a significant 583 difference in anticipatory licking during the cue period (two-sided t-test, α= 0.05). Imaging was 584 performed daily during training and continued for 3-7 sessions afterwards. Two mice were then 585 given 5 overtraining sessions. Calcium imaging data were acquired using PrairieView acquisition software and were saved into 608 multipage TIF files. Data were analyzed by using custom scripts written in ImageJ (National 609 Institutes of Health) or Matlab (Mathworks). Images were first corrected for motion in the X-Y 610 axis by registering all images to a reference frame. We used the pixel-wise mean of all frames in 611 the red channel containing the structural tdTomato signal to make a reference image. All red 612 channel frames were re-aligned to the reference image by the use of 2-dimensional normalized 613 cross-correlation (template matching and slice alignment plugin (Tseng et al., 2011)). The green 614 channel frames containing the GCaMP6s signal were then realigned using the same translation 615 coordinates with the 'Translate' function in ImageJ. After realignment, ROIs were manually 616 drawn over neuronal cell bodies using standard deviation and mean projections of the movies. 617 With custom MATLAB scripts, we drew rings around the cell body ROIs (excluding other ROIs) to 618 estimate the contribution of the background neuropil signal to the observed cellular signal. 619 Fluorescence signal for each cell was computed by taking the pixel-wise mean of the somatic 620 ROIs and subtracting 0.7x the fluorescence of the surrounding neuropil, as previously described 621 (Chen et al., 2013). After this step, the baseline fluorescence for each cell (F 0 ) was calculated 622 To provide a first insight into striosomal and matrix signaling, we integrated the fluorescence 638 signal from within an identified striosome and from a part of the matrix in the same field of 639 view that had a similar size, background fluorescence and number of neurons. DFF, calculated 640 as DFF = F t -F 0 / F 0 , was normalized by calculating z-scores relative to the signal at the end (1 s) 641 of inter-trial intervals to correct for relative differences between sessions. To determine the 642 selectivity of responses to different task events, the AUROC curves were calculated. For cue 643 selectivity, we calculated the AUROC curve by comparing the response during high-and low-644 probability cues. For the selectivity to rewarded trials, we calculated the AUROC by comparing 645 rewarded and unrewarded trials for the two cues separately. 646

647
Single cell analysis 648 The conditioning task had three epochs -cue, post-reward licking, and post-licking. To identify 649 task-modulated neurons active during these epochs, we aligned the data either to tone onset, 650 to the first lick after reward delivery, or to the end of licking. We compared the fluorescence 651 values over the following time windows to a 1-s baseline preceding each event. For the tone-652 aligned data, mean fluorescence was calculated over a 2-s time window after tone onset 653 separately for trials with either the high-or low-probability cues. Neurons that were 654 significantly active in either of the cue conditions were considered to be task-modulated. To 655 find neurons modulated during post-reward licking, GCaMP6 fluorescence was averaged 656 between the time when the animal first licked to the reward and the time that it stopped 657 licking. We also used a 1-s time window after end of licking for identifying task-modulated 658 neurons during this period. In some trials, animals did not stop licking until start of the next 659 trial. These trials were excluded from the analysis due to the difficulty in assigning licking end-660 time. For a neuron to be considered as task-modulated, we required that its activity exhibit a 661 significant increase from baseline for any of the three alignments (two-sided Wilcoxon rank-662 sum test; α = 0.01, corrected for multiple comparisons). Neurons exclusively active during only 663 one epoch of the task were considered to be selectively responsive during that period. Most 664 neurons (>80%) were significantly active only during one of the epochs. To compare signals 665 across neurons, we used z-score normalization of the DFF signals with a 1-s period before the 666 cue as a baseline. For analysis of the peak activity of task-modulated neurons, DFF signals were 667 normalized to the maximum of the session-averaged activity for any particular alignment in 668 order to compare peak activity times during the time interval of interest. 669 To determine whether reward outcome in the previous trial modulated licking behavior 670 during the task, we first compared anticipatory licking in trials that were followed by either 671 rewarded or unrewarded trials. We included all current trials, regardless of the cue or the  Anticipatory licking was significantly higher during sounding of the high-probability tone (blue) than during sounding of the low-probability tone (green). After reward delivery, licking rates were elevated for several seconds (solid lines: rewarded trials; dotted lines: unrewarded trials). (D) Mice began to exhibit differences in levels of anticipatory licking between the two cues after 11-12 sessions. Animals were considered to be trained when they had 2 out of 3 consecutive sessions with significantly higher anticipatory licking during the high-probability tone (blue). Shading represents SEM.