Striatal fast-spiking interneurons drive habitual behavior

Habit formation is a behavioral adaptation that automates routine actions. Habitual behavior correlates with broad reconfigurations of dorsolateral striatal (DLS) circuit properties that increase gain and shift pathway timing. The mechanism(s) for these circuit adaptations are unknown and could be responsible for habitual behavior. Here we find that a single class of interneuron, fast-spiking interneurons (FSIs), modulates all of these habit-predictive properties. Consistent with a role in habits, FSIs are more excitable in habitual mice compared to goal-directed and acute chemogenetic inhibition of FSIs in DLS prevents the expression of habitual lever pressing. In vivo recordings further reveal a previously unappreciated selective modulation of SPNs based on their firing patterns; FSIs inhibit most SPNs but paradoxically promote the activity of a subset displaying high fractions of gamma-frequency spiking. These results establish a microcircuit mechanism for habits and provide a new example of how interneurons mediate experience-dependent behavior.


Introduction 32
Habit formation is an adaptive behavioral response to frequent and positively reinforcing experiences. 33 Once established, habits allow routine actions to be triggered by external cues. This automation frees cognitive 34 resources that would otherwise process action-outcome relationships underlying goal-directed behavior. The 35 dorsolateral region of the striatum has been heavily implicated in the formation and expression of habits 36 through lesion and inactivation studies 2, 3 , in vivo recordings 4, 5 , and changes in synaptic strength 6 . More 37 recently, properties of the dorsolateral striatum (DLS) input-output transformation of afferent activity to striatal 38 projection neuron firing were found to predict the extent of habitual behavior in individual animals 7 . Despite 39 these observations, the cellular microcircuit mechanisms driving habitual behavior have not been identified. 40 41 DLS output arises from striatal projection neurons (SPNs), which comprise ~95% of striatal neurons 42 and project to either the direct (dSPNs) or indirect (iSPNs) basal ganglia pathways. The properties of evoked 43 SPN firing ex vivo linearly predict behavior across the goal-directed to habitual spectrum in an operant lever 44 pressing task 7 . Specifically, habitual responding correlates with larger evoked responses in both the direct and 45 indirect pathways as well as a shorter latency to fire of dSPNs relative to iSPNs. To identify a microcircuit 46 mechanism for habitual behavior, we manipulated the striatal microcircuitry to identify local circuit elements 47 that modulated these habit-predictive SPN firing properties (Fig. 1A, B). 48 49 Glutamatergic corticostriatal synapses express dopamine-dependent forms of long-lasting synaptic 50 potentiation and depression 8 , making these connections a fitting site for experience-dependent adaptation of 51 striatal output. Although such plasticity accompanies changes in behavior, including the formation of habits 6, 9 , 52 it does not readily explain the finding that increased gain in the direct and indirect SPNs in habitual mice was 53 balanced 7 since synaptic strengthening would occur separately on the two SPN classes through dichotomous 54 mechanisms 8 . In addition, within the DLS, habit-predictive SPN firing properties were distributed uniformly 55 rather than in discrete subpopulations of SPNs 7 . Because interneurons are often anatomically suited to tune 56 recordings for each SPN subtype (see Materials and methods). Firing properties were compared 105 within-cell before and after wash-in of IEM-1460.
Rather than specifying an arbitrary cutoff value for the transient amplitude, we used an unsupervised 121 clustering algorithm known as a Gaussian mixture model (GMM) to separate SPNs into two clusters. 122 Based on calibration data in this preparation demonstrating the relationship between calcium 123 transient amplitude and number of action potentials 7 , 124 the GMM separated SPNs into clusters corresponding 125 to multi-action potential (larger transients; "high-firing") 126 and single-action potential (smaller transients; "low-firing") responses (Fig. 1D). Compared to the use 127 of a physiologically-based 0.05 ΔF/F0 cutoff value, the unbiased GMM classification was in 90.5% (A) Schematic of calcium imaging approach. Top: SPN activity was evoked by electrical stimulation of cortical afferent fibers in an acute parasaggital brain slice. Bottom: Evoked SPN firing was imaged in the direct and indirect pathways simultaneously using a transgenic direct pathway reporter mouse line (left), calcium indicator dye fura-2 (middle) and two-photon laser scanning microscopy (right, see scanning vector in overlay). (B) Experimental approach. Striatal microcircuitry was manipulated in tissue from untrained animals in order to reproduce the known circuit substrate for habitual behavior (described in O'Hare & Ade, et al. 2016) and thereby identify a candidate microcircuit mechanism. (C) Representative heat maps of dSPN (x) and iSPN (•) calcium transient amplitudes before (left) and after (right) pharmacological inhibition of FSIs using IEM-1460 show a selective reduction in cells with the strongest (bright red) initial responses. (D) Left: Representative SPN calcium transient waveforms before and after wash-in of IEM-1460. SPNs were grouped into "high-firing" (red) or "low-firing" (blue) clusters based solely on their baseline response amplitudes using a Gaussian mixture model. SPNs with strong baseline responses (red, "high firing") show weaker responses after wash-in whereas those with initially weak responses (blue, "low firing") are unaffected. Right: Evoked calcium transient amplitudes for all imaged SPNs before (-) and after (+) wash-in of IEM-1460. For both cell types, high-firing SPNs showed decreased responses after IEM-1460 wash-in (dSPNs: t(22) = 6.43, p = 0.0000018, n = 23 cells; iSPNs: t(17) = 3.43, p = 0.0032, n = 18 cells) whereas low-firing SPNs did not (dSPNs: p = 0.24, n = 64 cells; iSPNs: p = 0.21, n = 34 cells). (E) Linear regression and correlational analyses show that the inhibitory effect of IEM-1460 on SPN responses (post -baseline difference) is a linear function of baseline response amplitudes for both dSPNs (red; r(86) = -0.87, p = 2.20 x 10 -28 , n = 87 cells) and iSPNs (green; r(51) = -0.80, p = 1.59 x 10 -12 , n = 52 cells). (F) Relative pathway timing, as measured by latency to peak detection, before and after inhibition of FSIs using IEM-1460. Indirect pathway activation precedes direct pathway activation by a greater margin after wash-in of IEM-1460 (t(102) = 2.42, p = 0.017, n = 52 independent dSPN/iSPN pairs). *p < 0.05. Dotted error bands indicate 95% confidence interval. Error bars indicate SEM. agreement. According to this pre-IEM-1460 categorization, low-firing SPNs were unaffected whereas 129 calcium transient amplitudes of high-firing SPNs were significantly reduced by IEM-1460 (Fig. 1D).  GMMs contains parameters for the Gaussian mixture model fits on pre-IEM-1460 calcium transient amplitude data by cell type. Amplitude values are included for high-and low-firing dSPNs and iSPNs in dSPNs_high, dSPNs_low, iSPNs_high, and iSPNs_low. Matrices are N x 2 with column 1 containing predrug amplitudes and column 2 containing paired measurements after drug wash-in. Data can be combined within cell type and run through PrePostGMM.m to reproduce the clustering shown in Figure 1D (see comments in code).
(t(7) = 2.37, p = 0.029, n = 8) while vehicle had no significant effect (p = 0.76, n = 8). Moreover, the 148 same selectivity for modulating multi-action potential responses was observed in that the magnitude 149 of IEM-1460's effect correlated with the size of baseline responses and there was no effect on single-  Altogether, this series of experiments identifies a pharmacological agent that potently inhibits 158 FSI activity and modulates all of the habit-predictive SPN firing properties. These results were 159 surprising for two reasons. First, rather than a blockade of FSI activity causing disinhibition of SPNs 160 as we had hypothesized, we found that when FSI activity was reduced, SPN activity was also 161 reduced. This result suggests that FSI activity is capable of promoting, rather than inhibiting, SPN 162 activity at least in the acute brain slice preparation. Secondly, although IEM-1460 strikingly affected

IEM-1460 selectively inhibits evoked multi-action potential SPN responses ex vivo.
Cell-attached electrophysiological recordings showing selective effect of IEM-1460 for multi-action potential SPN responses to afferent stimulation. Left: example trace showing multi-action potential SPN response to single-pulse stimulation of cortical afferents (top) and response to same stimulus after drug wash-in (bottom). Right: Effect of IEM-1460 (left) and vehicle (right) as a function of mean # APs fired prior to drug wash-in. IEM-1460 consistently reduced SPN responses to singlets (r(7) = 0.94, p = 0.00060, n = 8 cells) whereas vehicle had no such effect (mean effect = 0.28 ± 0.66; p = 0.89 for correlational analysis, n = 8 cells). *p < 0.05. Dotted error bands indicate 95% confidence interval. the same features of DLS output that predict the expression of habitual behavior (calcium transient 164 amplitude in both pathways and relative pathway timing) 7 , the directionality of these effects was 165 opposite in all measures. Therefore, these results revise the overall hypothesis to involve a gain, 166 rather than loss, of FSI activity as a candidate mechanism for habitual behavior. To examine the contribution of FSI activity to SPN firing, cortically-evoked SPN action  FSIs undergo long-lasting plasticity to become strengthened with habit formation 207 While results thus far show that FSIs appear capable of specifically modulating habit-predictive 208 properties of striatal output, we next examined whether FSI activity was different as a result of 209 experience. We measured FSI synaptic and cellular electrophysiological properties in DLS brain 210 slices prepared from habitual and goal-directed mice. PV-Cre mice were bilaterally injected with AAV5-Ef1a-DIO-EYFP in the DLS to label PV+ 213 interneurons and subsequently trained on an operant task in which they learned to press a lever for 214 sucrose pellet rewards. Lever presses were reinforced on a random interval (RI) schedule to induce 215 habit formation 27, 28 or on an abbreviated random ratio (RRshort) schedule to produce goal-directed 216 behavior 7 (Fig. 3 -figure supplement 1). Habit was measured by evaluating the sensitivity of the 217 learned lever press behavior to devaluation of the sucrose pellet reward. Goal-directed performance 218 is known to be highly sensitive to outcome devaluation whereas habitual performance is less 219 sensitive 27-29 . The sucrose pellet reward was devalued by inducing sensory-specific satiety.

220
Specifically, mice were pre-fed with the reward pellets, or as a control for general satiety-related 221 behavioral changes, identically-sized normal grain pellets. On separate but consecutive days, mice 222 were alternately pre-fed 1.3 g of either the sucrose pellet reward (devalued condition) or the grain-223 only pellet (non-devalued condition), counterbalancing which pre-feed condition was tested first.

224
Lever press rates were then measured during brief 3-minute probe tests without reinforcement.

225
Habitual behavior was quantified in individual mice as the log2 ratio of the devalued versus non-226 devalued lever press rates (normalized devalued lever press rate; NDLPr). RI-trained mice with an 227 NDLPr ≥ 0, i.e. insensitive to outcome devaluation, were considered to be habitual. RRshort-trained 228 mice with an NDLPr < 0 were considered to be goal-directed (

Electrophysiological properties of FSIs from habitual and goal-directed mice.
(A) Learning curves showing lever press rate over training sessions. Mice acquired lever pressing behavior with continuous reinforcement (CRF) of lever presses and were then trained on either random interval (RI) or abbreviated random ratio (RRshort) reinforcement schedules to induce habitual and goal-directed behavior, respectively. A final training session was administered after devaluation testing, and 0-24 hrs prior to recording, to mitigate any effects of devaluation testing. (B) Inclusion criteria for analysis of electrophysiological data. RRshort-trained mice that expressed goal-directed behavior (NDLPr < 0) and RI-trained mice that expressed habitual behavior (NDLPr ≥ 0) were included. Mice that expressed modes of behavioral control inconsistent with training, i.e. NDLPr < 0 for RI-trained mice were excluded from analysis. (C-D) Goal-directed (orange) and habitual (purple) mice used for group-wise comparisons of electrophysiological properties did not differ in total number of lever presses (p = 0.72, n = 7 & 5 mice) or number of rewards delivered (p = 0.72, n = 7 & 5 mice, Mann-Whitney U test) over the course of training. (E-H) Passive membrane properties of FSIs in slices from goal-directed and habitual mice. No differences were found for any membrane property (p = 0.13, 0.081, 0.67, 0.58, n = 15 & 9 cells). (I) Left to right: representative action potential traces and quantification of action potential amplitude, half-width, and afterhyperpolarization current duration for FSIs from goal-directed and habitual mice. No difference was detected for any waveform property (p = 0.60, 0.71, 0.53 n = 13 & 8 cells). Data are represented as mean ± SEM. goal-directed and habitual FSIs (Fig. 3A). Additionally, paired-pulse ratios of evoked EPSCs 236 measured at a 50 ms inter-stimulus interval were similar between groups (Fig. 3B). During these 237 recordings, we also did not observe any group differences in a number of passive membrane Rather than changes in synaptic strength, we instead found robust differences in FSI firing 241 responses to somatic current injection. FSIs from habitual mice displayed higher firing rates 242 compared to FSIs from goal-directed mice (Fig. 3C). Action potential kinetics did not appear to 243 explain these group differences in firing rates as action potential waveforms were not appreciably  Habitual behavior was associated with increased FSI firing in response to somatic current 257 injection. However, it was afferent activation that initially revealed habit-predictive striatal output properties 7 . Therefore, in order for FSI plasticity to alter striatal output, it must be sufficient to 259 differentially drive FSI firing in response to similar coincident synaptic excitation. FSI firing was 260 monitored in cell-attached mode in response to electrical stimulation of excitatory afferents. We found 261 that FSIs of habitual mice fired more readily than those from mice with goal-directed behavior (Fig.   262   3F). This habit-related difference in FSI excitability was not readily explained by other aspects of lever 263 pressing performance including the total number of lever presses or rewards delivered over the 264 course of training ( Fig. 3 -figure supplement 1). We noted the apparent bimodal distribution of total  Since photo-inhibiting FSIs produces striatal output properties that directly oppose those seen 275 in habit (Fig. 1), we inhibited FSIs after habit training to determine the necessity of FSI activity for 276 expression of habitual behavior. Mice underwent habit-training protocols in the operant lever press 277 task and then, prior to testing the degree of habitual responding. FSIs were inhibited 278 chemogenetically. We selected a chemogenetic approach to allow for continuous modulation of 279 activity during the 3 minute probe tests which measure habitual behavior. Drd1a-tdTomato 26 ::PV-Cre 280 mice were bilaterally injected in DLS with AAV vectors Cre-dependently encoding either the inhibitory 281 hM4D chemogenetic receptor 30 (PV-hM4D) or eYFP (PV-eYFP) (Fig. 4A, B). Both groups underwent 282 the same habit-promoting RI reinforcement protocol and learned similarly (Fig. 4C). For both the 283 devalued and non-devalued conditions, after each pre-feeding period and thirty minutes prior to the 284 outcome devaluation probe tests, the hM4D agonist clozapine N-oxide (CNO, 5 mg/kg) was delivered 285 intraperitoneally (Fig. 4D). Chemogenetic inhibition of PV+ interneurons did not affect operant behavior in general, as 291 evidenced by indistinguishable lever press rates between groups in the non-devalued (grain pellets) 292 condition ( Fig. 4 -figure supplement 1). In contrast, a comparison of sensitivity to outcome 293 devaluation between groups revealed that habit expression was suppressed in PV-hM4D mice 294 relative to PV-eYFP controls (Fig. 4E). Mean NDLPr for RI-trained PV-EYFP control mice measured (C) Learning curves for hM4D and reporter construct-injected cohorts show that groups did not learn the task differently (p = 0.70, n = 10 & 11 mice). (D) Experimental flow of devaluation testing to evaluate habit expression. Upon completion of multi-day training sessions, mice were pre-fed sucrose or grain pellets on alternating days, intraperitoneally administered CNO, and subjected to a 3minute extinction probe test 30 minutes later. Devalued (sucrose) and non-devalued (grain) lever press rates (LPr) are compared ratiometrically using the normalized devalued LPr (NDLPr) to assess habitual behavior: log Quantification of habit expression in individual subjects using NDLPr. PV-hM4D mice showed less habit expression relative to PV-eYFP controls (t(19) = 2.66, p = 0.016, n = 10 & 11 mice). *p < 0.05. Data are represented as mean ± SEM. at 0.46 ± 0.27, indicating that control mice were insensitive to outcome devaluation, i.e. habitual. By 296 contrast, PV-hM4D mice, which received the same RI training schedule and showed comparable 297 rates of lever pressing (Fig. 4C)

Chemogenetic inhibition of FSIs in dorsolateral striatum does not affect operant lever pressing in general.
Lever press rates during the non-devalued probe test. Mice from both groups were pre-fed a sensory-specific satiety control pellet (grain-only) and administered CNO (5 mg/kg, intraperitoneally) prior to undergoing a 3 min extinction probe test to assess the effect of inhibiting FSIs on operant behavior independent of sensitivity to outcome value, i.e. habit. Mice expressing hM4D and eYFP in FSIs of the DLS did not differ in response rates (p = 0.53, n = 10 & 11 mice), indicating that inhibition of DLS FSIs did not affect general lever pressing behavior. Two mice displayed unilateral infection (yellow) as opposed to bilateral (green). Because inclusion or exclusion of these data did not affect statistical results for any behavioral measure, data were included and indicated as above. Data are represented as mean ± SEM.
virus. Single units corresponding to both FSIs and SPNs were recorded in freely-moving mice (Fig.   310 5A-D) for 30 min before intraperitoneal (i.  Fig. 5F).

333
That is, the higher the fraction of gamma-frequency spikes an SPN fired, the more likely it was to fire 334 less when FSIs were chemogenetically inhibited. No such relationship was observed in response to 335 vehicle (Fig. 5G, right). firing as traditionally assumed, we also found evidence that they potentiate activity in a select 355 population of SPNs that displays higher fractions of gamma-frequency spiking. This selective 356 potentiation may be akin to a winner-take-all "focusing" mechanism that increases the signal-to-noise 357 ratio in corticostriatal transmission. According to such a mechanism, the subset of recruited SPNs 358 would be facilitated while the less-relevant, low-gamma SPNs would be suppressed.  The approach we took to reveal the microcircuit mechanisms for habit was to identify a 378 potential source for the broad local DLS circuit reorganizations of SPN firing properties that strongly 379 correlate with habit (Fig. 1A, B). To do this, we first examined how FSIs influenced striatal output 380 using a pharmacological approach that inhibits excitatory synapses on striatal FSIs (and also CINs).

381
In brain slices from untrained mice, IEM-1460 treatment showed striking specificity in that it 382 modulated all of the previously described 7 habit-predictive properties of evoked SPN firing ex vivo: 383 gain of dSPN and iSPN responses (Fig. 1D, E), and the relative timing of firing between dSPNs and 384 iSPNs (Fig. 1F). IEM-1460 also showed specificity in that it did not affect properties such as spike 385 probability ( Fig.1 -figure supplement 1) that are not predictive of habit.

387
Unexpectedly, we found that the directionality by which FSIs modulated these properties was 388 opposite to our original hypothesis: instead of the expected disinhibition of SPNs, silencing FSIs 389 reduced SPN output (Fig. 1B-E). FSI inhibition also altered the timing of direct and indirect pathway 390 neuron firing in a direction that opposed the habit circuit signature (Fig. 1B, F)  overall increase in projection neuron activity (Fig. 5F) which suggests that reducing FSI activity 415 specifically may impair habit expression differently than a general inactivation of the circuitry.  Using opto-and chemo-genetic manipulations, we further found that FSIs, which are when FSI activity was reduced (Fig. 5G). This finding is reminiscent of a previous in vivo report that The CAG-FLEX-rev-hM4D:2a:GFP plasmid was provided by the Sternson laboratory at Janelia Farm 497 (Addgene #52536). UNC Viral Vector Core packaged this plasmid into AAV 2/5 and also provided AAV2/5-498 EF1a-DIO-EYFP. All viral aliquots had titers above 1 x 10 12 particles/mL. 499 500 Intracranial viral injections 501 Stereotaxic injections were carried out on 2-3 month old PV-Cre::Drd1a-tdTomato mice under isoflurane 502 anesthesia (4% induction, 0.5 -1.0% maintenance). Meloxicam (2 mg/kg) was administered subcutaneously 503 after anesthesia induction and prior to surgical procedures for postoperative pain relief. Small craniotomies 504 were made over the injection sites and 1.0 μL virus was delivered bilaterally to dorsolateral striatum via a 505 Nanoject II (Drummond Scientific) at a rate of 0.1 μL/min. The injection pipette was held in place for 5 minutes 506 following injection and then slowly removed. Coordinates for all injections relative to bregma were as follows: showing no expression or poor targeting (misses were medial to DLS) were excluded from the study prior to 510 behavioral analysis and data unblinding. Two AAV2/5-CAG-FLEX-rev-hM4D:2a:GFP-injected mice showed 511 expression in only one hemisphere of DLS. These mice were included for behavioral analysis and behaved no 512 differently from bilaterally-infected mice. We note that exclusion of these two subjects does not affect the 513 statistical significance of the result. 514 515 Lever press training 516 Prior to training, animals were restricted to 85-90% baseline weight to motivate learning. Lever presses 517 were rewarded with sucrose-containing pellets (Bio-serv, F05684) and grain-only pellets (Bio-serv, F05934) 518 were used as a sensory-specific control for satiety. Mice were trained in Med Associates operant chambers 519 housed within light-resistant, sound-attenuating cabinets (ENV-022MD). Lever presses and food cup entries 520 were recorded by Med-PC-IV software. During RR reinforcement, pellets were delivered every X times on 521 average for an RR-X schedule. RI reinforcement gave a 10% probability of reward every X seconds for an RI-X 522 schedule. Following random reinforcement training, subjects underwent devaluation testing to measure 523 habitual behavior as previously described 7 . When training schedule was a variable, experiments were 524 performed with experimenter blind to training schedule. 525 For electrophysiological assessment of FSI properties, acute brain slices were prepared 0-24 hours 526 after the final training session. Mice were excluded from analysis if they did not display the behavior that was 527 expected based on training schedule. Specifically, mice that were trained to be habitual (random interval 528 reinforcement) yet showed sensitivity to outcome devaluation (NDLP r < 0) were excluded. 529 530 Brain slice preparation 531 Animals were anesthetized using 2,2,2-tribromoethanol and transcardially perfused with ice-cold N-Methyl-D-532 glucamine ( Cell-attached experiments: Stimuli were delivered to cortical afferent fibers at the cortical side of the internal 568 capsule ( Fig. 2A) using a bipolar stimulating electrode (FHC, CBARC75). Responses in SPNs and FSIs were 569 recorded in cell-attached configuration with voltage clamped at 0 mV. Leak current was continuously monitored 570 to detect partial break-ins. In the event of a partial membrane rupture, leak currents increased significantly due 571 to the voltage at which the membrane patch was clamped. In these events, data were discarded. The same 572 potassium methansulfonate-based internal solution as in the current clamp experiments was used to enable 573 break-in and cell type identification or further recordings after cell-attached experiments concluded. All stimuli 574 were delivered with a 20 second inter-stimulus interval. For input-output experiments, 300 μs single-pulse 575 stimuli were delivered with 5 sweeps per intensity, in order from weakest to strongest intensity, and cells were 576 recorded at a consistent distance from the stimulating electrode (600 -650 μm). For pre-post experiments with 577 application of IEM-1460, 300-600 μs single-pulse stimuli were delivered to drive multi-action potential 578 responses prior to drug wash-in. 10 sweeps were analyzed as baseline and another 10 sweeps, using the 579 same stimulus parameters, following a 20-minute wash-in period were analyzed to measure drug effect. 580 581 In vitro optical inhibition of FSIs: 532 nm light was delivered from a diode-pumped solid state laser (Opto 582 Engine) coupled to a 300 μm core, 0.39 NA patch cable which terminated into a 2.5 mm ferrule (Thorlabs Inc.). 583 The ferrule was submerged in the perfusion chamber and positioned with a micromanipulator to illuminate a 584 ~0.5 mm radius around the tip of the recording pipet. Laser onset coincided with electrical stimulation of 585 cortical afferents. Laser stimulation lasted 500 ms in whole cell current clamp experiments and 1 sec when 586 monitoring synaptically-evoked responses in cell-attached mode.

588
In vivo single-unit recordings: Custom-made multi-electrode arrays were used for all recordings. The arrays 589 consisted of fine-cut tungsten wires and a 6-cm-long silver grounding wire. Tungsten wires were 35 μm in 590 diameter and 6 mm in length, arranged in a 4 × 4 configuration. The row spacing was 150 μm, and electrode 591 spacing was 150 μm. All arrays were attached to the 16-channel Omnetics connector and fixed to the skull with 592 dental acrylic. After hM4D viral injection into the dorsolateral striatum, the electrode arrays were lowered at the 593 following stereotaxic coordinates in relation to bregma: 0.8 rostral, 2.75 lateral, and 2.6 mm below brain 594 surface. Single-unit activity was recorded with miniaturized wireless headstages (Triangle BioSystems 595 International) using the Cerebus data acquisition system (Blackrock Microsystems), as previously described 58 . 596 The chronically implanted electrode array was connected to a wireless transmitter cap (~3.8 g). During 597 recording sessions, single units were selected using online sorting. Infrared reflective markers (6.35 mm 598 diameter) were affixed to recording headstages to track mouse position as subjects moved freely on a raised 599 platform. Marker position was monitored at 100 Hz sampling rate by eight Raptor-H Digital Cameras 600 (MotionAnalysis Corp.). Before data analysis, the waveforms were sorted again using Offline Sorter (Plexon). 601 Only single-unit activity with a clear separation from noise was used for the data analysis. In each case, a unit 602 was only included if action potential amplitude was ≥ 5 times that of the noise band. FSIs and SPNs were 603 classified on the basis of spike width and baseline firing rate ( Fig. 5A-D from single neurons that were positively identified as the corresponding cell type in a subsequent whole cell 659 current clamp recording. The dot product representing a perfect fit was obtained by cross-correlating the 660 template peak to itself. If the dot product of the data and the template peak was equal or greater than 25% of 661 this perfect fit, then an action potential was called by a peak detection algorithm (Mathworks, Inc.). Stimulus 662 artifacts and rare spontaneous action potentials were excluded by only analyzing data from 1 -100 ms (FSIs) 663 or 1-600 ms (SPNs) after stimulus delivery. Due to the sharp FSI cell-attached waveform, electrical noise was 664 matched to the FSI template peak in some recordings. To exclude these false calls, two additional exclusion 665 criteria were added: action potentials were excluded (1) if their amplitudes were less than 10 standard 666 deviations of the recording minus the stimulus artifact, i.e. electrical noise and (2) if their cross-correlation peak 667 amplitudes were less than 25% of the maximum peak in a given sweep.

669
Current-clamp experiments: Action potentials were detected by running a peak detection algorithm (Mathworks 670 Inc.) on voltage velocity data with a peak threshold of 1 x 10 4 V/s and a minimum peak distance of 2 ms. Action 671 potential onset and offset were defined at the intersections of the waveform with a sliding mean baseline 672 voltage that constituted 10% of the length of the current injection. Action potential and after-hyperpolarization 673 properties were measured up to the point when increasing current injection attenuated firing rate. Action 674 potential half-width was defined as half the time between onset and peak voltage. Action potential amplitude 675 was defined as the voltage difference between the sliding baseline and peak amplitude. AHP potential onset 676 and offset were defined as the next two intersections with the sliding baseline after the action potential peak 677 voltage. AHP amplitude was defined as the negative-most voltage between onset and offset and the AHP 678 waveform was integrated over the sliding baseline for total voltage. AHP voltage measurements were 679 converted to current using input resistance. Firing rates were measured in response to a series of increasing 680 500 ms current step amplitudes ranging from -0.4 to 2.0 nA in 200 pA intervals. Maximum response duration 681 was defined as the longest period of sustained firing observed during this series of current injections. 682 Rheobase was determined by identifying the 200 pA interval in which the first action potential was fired and 683 subsequently interrogating this interval with 500 ms current injections at 10 pA resolution. Subthreshold test 684 pulses were used to determine passive membrane properties. Input resistance was calculated as R I = dV/I. 685 Whole cell capacitance was calculated by integrating the decay phase after current injection to measure 686 discharged current and dividing by voltage of the current injection: ∫V decay /IR i 2 . Series resistance was calculated 687 by fitting a standard double-exponential function to the decay transient and deriving the time constant ꞇ=1/λ fast 688 to find ꞇ fast = R s x C whole cell . Cells with R s > 30 megaOhms were excluded from analysis. 689 690 Voltage clamp experiments: Voltage clamp experiments assessing habit-related FSI physiology were carried 691 out in the presence of picrotoxin (50 μM). Paired pulse ratio was calculated as log 2 (EPSC 2 /EPSC 1 ) for first and 692 second EPSC amplitudes. Paired stimuli were delivered 50 ms apart. Spontaneous EPSCs were recorded at 693 V m = -70 mV at 5X gain for 5 minutes per cell. Automated event detection was performed using MiniAnalysis 694 (Synaptosoft). To validate the use of PV-Arch, 532 nm light-induced currents were recorded in FSI and SPNs 695 in the presence of gabazine (10 μM), AP5 (μM), and NBQX (50 μM) to block GABA A , NMDA, and AMPA 696 receptors, respectively. 697 698 699 In vivo single-unit recordings: Single unit activity was sorted into frequency bins by converting interspike 700 intervals to instantaneous firing rates. Frequency bands were defined as Δ = 0-4 Hz, θ = 4-8 Hz, α = 8-13 Hz, β 701 = 13-30 Hz, and γ = 30-100 Hz. The fraction of ISIs falling in a particularly frequency band was calculated 702 relative to the total number of ISIs. To compare frequency band distributions of single unit records to rate-703 matched Poisson processes, for each single unit with N ISIs, N points were randomly drawn from a Poisson 704 distribution with λ set to the mean ISI (1 / mean firing rate) for the corresponding single unit. This simulation 705 was run 20 times per single unit. All 20 simulations were binned according to the described frequency band 706 bounds and normalized counts were averaged across simulations. Since each simulated unit corresponded to 707 a real recording with mean firing rate = 1/λ, observed and simulated data were compared via multiple paired t-708 tests and Bonferroni-Sidak correction for multiple comparisons. For behavioral analysis, 3D tracking data were 709 transformed into Cartesian coordinates (x, y and z) by the Cortex software (MotionAnalysis Corp.) to allow 710 distance calculations. 711 712 2PLSM calcium imaging: Raw frames were corrected using a drift correction algorithm 60 to control for minor 713 fluctuations in X and Y. Baseline fluorescence was measured over a 2 second sliding window to calculate 714 change in fluorescence over baseline (ΔF/F 0 ). Action potentials were detected using a cross-correlation 715 approach as described for current clamp and cell-attached recordings above. The template peak was 716 generated by simultaneous calcium imaging + cell-attached electrophysiological experiments and represented 717 a single action potential 7 . Detected peaks possessed dot-products at least 50% that of a perfect fit (cross-718 correlating template to itself). Although dSPN and iSPN calcium transients are similar in these experimental 719 conditions 7 , separate template peaks corresponding to the SPN subtype classification of each ROI were used. 720 Additional inclusion criteria beyond the cross-correlation threshold were used at the level of event detection, 721 ROI inclusion, and slice inclusion to maximize data quality and reliability. Detected events were included as 722 evoked responses only if they occurred within 375 ms of stimulation-any other events were excluded from 723 analysis. Additionally, a lockout window was set in the peak detection algorithm to ensure that no event could 724 occur within 1 second of the previously detected event. For an ROI to be included, a noise threshold was