Main

The basal ganglia are a group of interconnected subcortical nuclei that integrate information from multiple brain centres to modulate goal-directed behaviour. The striatum is the principal input structure of the basal ganglia, and its function is controlled by a complex array of neurotransmitters and neuromodulators2,3. Among these is dopamine (DA), which is released in the striatum by long-range axons arising from midbrain ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) neurons4,5,6. DA neurons (DANs) are thought to drive reinforcement learning by encoding reward prediction error—the difference between experienced and expected reward—and, mechanistically, by regulating multiple aspects of neuronal and synapse function7,8,9,10. Disruption of DA signalling contributes to many debilitating psychomotor disorders, including Parkinson’s disease and drug addiction11.

In addition to having the highest concentrations of DA and DA receptors in the mammalian brain, the striatum also contains some of the highest levels of acetylcholine (Ach)12,13, which is primarily released by local CINs, a specialized and rare cell type14,15. Pioneering studies in primates revealed that CINs reduce or ‘pause’ their firing in response to both appetitive and aversive stimuli over the course of learning, leading to the hypothesis that they modulate reinforcement learning15,16,17,18,19. CIN pauses in turn might alter the plasticity of corticostriatal synapses to support procedural learning20.

Bidirectional interactions between DA and Ach release have long been observed within the striatum during learning and in Parkinson’s disease15,16,21,22. Subsequent in vitro studies uncovered a striatal circuit by which DA and Ach directly influence each other. Synchronized firing of multiple CINs activates nicotinic Ach receptors that are located on and depolarize DAN axons23,24,25,26. If of sufficient amplitude, this depolarization induces a propagating axonal action potential that evokes the release of DA within the striatum25 (Fig. 1a). In turn, DA potently inhibits the activity of CINs by acting on D2Rs expressed by CINs27,28,29 (Fig. 1a).

Fig. 1: Multiphasic dynamics of DA and Ach in the VLS during reward-based decision-making.
figure 1

a, Proposed DA and Ach interactions. CINs release Ach, which evokes the release of DA through nicotinic acetylcholine receptors (nAchRs) on DAN terminals. Conversely, DA inhibits CINs through D2Rs. VLS, ventrolateral striatum. b, 2ABT parameters. An LED (yellow) signals trial initiation. A single reward is probabilistically delivered when the mouse makes the correct choice (P(R|left) is the probability of a reward delivered at the left port and P(R|right) is the probability of a reward delivered at the right port), and the trial is terminated with side-port exit. Fibre photometry is simultaneously performed. c, Ipsilateral DA and Ach dynamics and licks (yellow) recorded from an example mouse during a 2ABT session. Each row depicts the z-scored sensor signal of a trial. d, DA and Ach signals during different reward outcomes. The averaged z-scored signal ± s.e.m. is shown. Data are aligned to side-port entry (SE) (DA: n = 13 mice; Ach: n = 14 mice). e, Statistical analysis of d. Rewarded versus unrewarded trials are compared (left and right dot in connected pair, respectively). Open circles denote a significant difference for each comparison (two-sided t-test (P < 0.05)). Mean DA and ∆Ach ± s.d. are shown. Percentages represent LDA classification accuracy. f, DA and Ach release during rewarded 2ABT trials in which the LEDs that signal centre-port and side-port entry are present or omitted. Data are shown as in d (n = 4 mice). g, DA and Ach release for trials in which mice choose the same port in both the previous and the current trials, which are segregated by the reward outcome of the previous trial. Data are shown as in d (DA: n = 13 mice; Ach: n = 14 mice). h, DA and Ach release for trials in which mice switch ports. In g,h, the text denotes previous outcome followed by current outcome (for example, Win, win). Data are shown as in d (DA: n = 13 mice; Ach: n = 14 mice).

Source Data

Despite a detailed mechanistic understanding of the interactions between DA and Ach in vitro, if, when and how these control the levels of DA and Ach to regulate striatal function in vivo are largely unknown. It is unclear whether sufficient CIN synchronization occurs and to what degree nicotinic Ach receptors are available to evoke DA release in vivo30, nor is it known when and if the potential influence of CINs on DA signalling is functionally important. Although CIN-evoked DA release has been proposed to explain differences between DAN somatic activity and striatal DA levels during motivated approach behaviours and longer timescales of reward-value encoding31, previous studies report a robust correlation between somatic and axonal signalling32,33. Finally, although CIN pauses can be induced by D2R activation, they can also be triggered by cortical and thalamic projections and GABAergic VTA inputs34,35,36. Indeed, CIN-specific deletion of the gene encoding D2Rs reduces, but does not abolish, the CIN pause in a reward-based task, suggesting that other sources of modulation exist37.

Adaptive switching in decision-making

To examine the local circuit interactions between striatal Ach and DA during, and their contributions to, such behaviours, we monitored the levels of neuromodulators in mice performing a dynamic and probabilistic two-port choice task modelled after paradigms that engage striatal pathways and require striatal activity for optimal performance38,39,40. We used only male mice to avoid the variability of cholinergic signalling in females (Supplementary Information). In this two-armed bandit task (2ABT), mice move freely within a box that contains three ports separated by physical barriers (Fig. 1b). An LED above the centre port signals that the mouse can initiate a trial by placing its snout (‘poking’) into the centre port. The mouse must then choose to poke into either a left or a right port, each of which probabilistically delivers water after snout entry. In a block structure (30 rewards between block transitions), either the left or the right port is designated as ‘high reward probability’ (P(reward) = Phigh) and the other port as ‘low reward probability’ (P(reward) = 1 − Phigh). To efficiently obtain rewards, the mouse learns which is the high-reward-probability port in that block and detects when block transitions occur. This task structure requires mice to use flexible decision-making strategies and to integrate information about previous trial outcomes to make a choice.

Mice robustly alter their port selections at block boundaries to repeatedly choose the highly rewarded port and occasionally sample the low-reward-probability port (Extended Data Fig. 1a). After successive unrewarded trials resulting from reversals of the reward probability at block transitions, the mice transiently increase their probability of switching ports between trials (P(switch)) (Extended Data Fig. 1b), which facilitates the selection of the new high-reward-probability port (P(high port)) (Extended Data Fig. 1c). As a result of this behavioural flexibility, proficient mice achieve rapid decision times and high reward rates (Extended Data Fig. 1d). During the 2ABT, we capture the timing of key behaviour events, including the timing of port entries and withdrawals, the timing and number of licks at each port and the reward outcomes (Extended Data Fig. 1e). Entry into and exit from the centre port occur in rapid succession, followed by a delayed entry into the side port (Extended Data Fig. 1e). In rewarded trials, the water reward is triggered by entry into the side port, and mice repeatedly lick to consume the reward, whereas in unrewarded trials the mice rarely lick the port (Fig. 1c and Extended Data Fig. 1e).

To investigate the mouse behaviour and evidence accumulation in the task, we used a recursively formulated logistic regression (RFLR) that was developed from a 2ABT40. In this linear model, the conditional probability of the mouse’s next choice is based on a latent representation of evidence about the interaction between its actions and reward outcome (i.e., action value). This variable decays over time and is recursively updated by new evidence from each trial’s choice and outcome. There is additional bias towards or away from the mouse’s most recent choice (Extended Data Fig. 1f). The RFLR model uses three parameters to capture, respectively, the tendency of an animal to repeat its last action (alpha, α), the relative weight given to information about past action and reward (beta, β) and the time constant over which action and reward history decay (tau, τ) (Extended Data Fig. 1f). The RFLR coefficients are comparable across mice that are proficient on the 2ABT (Extended Data Fig. 1g). Moreover, the RFLR model accurately predicts the switching dynamics at block transitions as well as the probability of switching on the current trial, which depends on the choice and reward history of previous trials (Extended Data Fig. 1h–j). Altogether, mice achieve high proficiency on a probabilistic reward task, and their behaviour is accurately captured by a reduced logistic regression model.

Ach and DA are dynamically regulated

To determine how DA and Ach signals change during the 2ABT, we used frequency modulated fibre photometry to record the fluorescence of the genetically encoded sensors for DA (dLight1.1)41 and Ach (GRAB-Ach3.0, abbreviated as Ach3.0)42 expressed in separate hemispheres within the ventrolateral portion of the dorsal striatum (VLS), a region associated with controlling the behaviour of mice in reward-based decision-making tasks43 (Extended Data Fig. 2a–d,i.j). We observed robust and multiphasic DA and Ach transients in individual trials that differed depending on reward outcome (Fig. 1c,d), and that depended on neuromodulator binding, because they were absent in ligand-binding-site mutants of the sensors (Extended Data Fig. 2k–p).

To understand which behavioural features affect DA and Ach transients, we compared their profiles during rewarded and unrewarded trials. As expected, DA signals changed at the instances of task-relevant behavioural events, and they diverged depending on reward outcome (Fig. 1d). To quantify these signals, we identified a single metric that best captured the changes in each neuromodulator across varying trial types, and we performed comparisons between pairs of signals in the denoted conditions. For DA, we calculate the mean of the z-scored signal in the designated time range before or after side-port entry (Extended Data Fig. 3a). For Ach, we take the difference in the maximum and minimum signal in a defined time window, which we call ∆Ach (Extended Data Fig. 3b). Owing to the multiphasic nature of Ach transients, this metric captures differences in Ach signals across conditions more accurately than the mean. Reward outcome greatly alters DA and Ach transients, with unrewarded trials resulting in a robust decrease in mean DA and a consistent increase in ∆Ach (Fig. 1e). Notably, the changes are significant after (‘post’), but not before (‘pre’) side entry, and these signals are not lateralized (Supplementary Information).

Because our transients are complex, a single metric such as mean DA or ∆Ach captures limited features of these signals. Therefore, we complemented this analysis with a supervised classification approach—linear discriminant analysis (LDA)—to quantify the degree to which the waveform of the photometry signals differs across conditions and the degree to which trial-by-trial signals can be used to classify the trial type (Extended Data Fig. 3c). Supporting our observation that reward outcome greatly alters DA and Ach transients, but only post-side entry, the LDA classification accuracy is very high when trained on the signals post-side entry (higher than 80%), but not when trained on those pre-side entry (around 50%) (Fig. 1e).

In contrast to past reports that striatal Ach is outcome-insensitive17, we found that reward outcome robustly modulates Ach transients (Fig. 1d). Further support for reward-outcome modulation of both DA and Ach is revealed during sessions in which the LED cues that signal trial initiation and side-port entry are off but all other 2ABT conditions remain the same. In this scenario, rewards are less expected owing to the absence of the LED cues that normally signal that the centre and side ports are active. Consistent with this change in expectation, the DA transient in cue-omission rewarded trials is significantly increased after side-port entry and decreased before side entry, whereas Ach transients are modulated in the opposite direction (Fig. 1f and Extended Data Fig. 4a).

Choice and reward histories are integral to the decision-making process and can lead to different reward expectations, the effects of which on Ach signalling are poorly understood. We subdivided rewarded and unrewarded trials by the task history and found that the outcome of the previous trial strongly modulates DA and Ach signals in the current trial. When a mouse chooses the same port in two consecutive trials, a rewarded trial following a previously unrewarded trial (‘lose–win’) is more unexpected than a rewarded trial that follows a win (‘win–win’). Indeed, mean DA and ∆Ach signals increase post-side entry in the ‘lose–win’ scenario, reflecting different reward expectations owing to past experience (Fig. 1g and Extended Data Fig. 4d,e). Conversely, for an unrewarded trial, mean DA dips more and ∆Ach rises more if this trial was preceded by a rewarded trial (‘win–lose’) rather than by an unrewarded trial (‘lose–lose’) (Fig. 1g, right and Extended Data Fig. 4d,e). Altogether, this is consistent with the encoding of reward prediction error. Notably, these effects of expectation on DA and Ach are absent if the mouse switches ports between trials—the signals after side-port entry are similar (Fig. 1h and Extended Data Fig. 4d,e). Instead, the DA signals during the transition from centre to side port are greater when the previous trial was unrewarded (Fig. 1h). Thus, when a mouse chooses to switch ports between trials, it approaches this choice in a different state shaped by outcome history; however, it resets any history-dependent reward expectation in the post-side-entry period, during which the mice evaluate the reward outcome. Analysis of the intertrial interval (ITI) signals reveals that, although the motor action may contribute to DA and Ach dynamics, reward expectation significantly shapes both neuromodulator transients (Supplementary Information). Altogether, prior choice and reward experience modulate both DA and Ach.

Finally, we observed that changes in DA and Ach are often temporally coincident but in the opposite direction; however, the relationship between DA and Ach is neither simple nor fixed, as there are periods in which both signals go up or down synchronously or independently, suggesting that there is a flexible and dynamic interaction between the two neuromodulators.

Action-outcome history shapes DA and Ach

To evaluate the contribution of each behavioural event to DA and Ach dynamics formally and quantitatively, we developed a generalized linear model (GLM) to predict neuromodulator signals from behaviour (Supplementary Information). In our simplest GLM model, which we term the ‘base GLM’, we included variables based on key behavioural events. We find that it captures substantial variance across the trial-associated data (DA GLM R2 = 0.206; Ach GLM R2 = 0.206) (Extended Data Fig. 5c,d) and its performance is comparable to other GLMs that are used to predict photometry signals33. To assess the degree to which each behavioural variable contributes to GLM performance, we performed a ‘leave-out analysis’ in which we iteratively omit a single behavioural feature and evaluate the GLM performance (Extended Data Fig. 5f). For both DA and Ach models, the closely timed centre entry and centre exit are redundant because loss of either alone does not affect the model fit. By contrast, the omission of several variables greatly increased the mean squared error (MSE) and thus weakened the performance of the GLM, indicating that the inclusion of these variables is necessary to successfully capture the variance in the neural signal. DA and Ach signals are more accurately reconstructed with the addition of side entry and reward predictors, and, additionally, side exit and lick enable better reconstructions of the Ach signal. This analysis highlights the unique influence of each behavioural event on DA and Ach dynamics.

Although the base GLM robustly reconstructs the measured signals of both Ach and DA transients, there are discrepancies across several trial histories (Extended Data Fig. 5c,d). Given the importance of choice and reward history for modulating the signals of both neuromodulators (Fig. 1g,h), we expanded the feature set of the base GLM to include side-port entries segregated by the eight possible action-outcome combinations, which we term the ‘history GLM’ (Extended Data Fig. 5g,h). Inclusion of these parameters reduced the MSE between the predicted and the test data for both DA and Ach GLMs (Extended Data Fig. 5f, ‘+ history’). This reflects an improvement in the ability of the history GLMs to capture the variance of the trial-associated data (DA GLM R2 = 0.213; Ach GLM R2 = 0.214) without overfitting, despite the addition of multiple parameters (Extended Data Fig. 5i). Altogether, by modelling DA and Ach signals with GLMs, we reveal the influence of multiple and different behavioural variables and action-outcome history on the dynamics of each neuromodulator during decision-making.

Ach and DA release are anticorrelated

Because DA and Ach might directly interact in vivo, we characterized the relationship between their signals to determine whether they support the proposed interactions. To more accurately assess the dynamics of and relationship between DA and Ach transients, we performed simultaneous recordings of both neuromodulators within the same hemisphere by coexpressing a red-shifted DA sensor, rDAh44 and the green Ach sensor (Fig. 2b,c and Extended Data Fig. 6k,l). To understand the effect of switching DA sensors, we exploited the fact that the release of DA is highly correlated across hemispheres within the same mouse (Extended Data Fig. 6c), allowing us to directly compare DA signals detected by rDAh versus dLight1.1. Both sensors yield comparable signals, but with consistently reduced amplitudes for rDAh (Fig. 2a and Extended Data Fig. 6d,i,j), probably reflecting its slower kinetics and higher affinity for DA compared to dLight1.1.

Fig. 2: Ach and DA signals are dynamically correlated during reward-based decision-making.
figure 2

a, DA release detected by dLight1.1 and rDAh recorded from bilaterally injected mice (left). The average z-scored sensor signal ± s.e.m. is shown (errors are often smaller than line thickness) (n = 3 mice). AAV, adeno-associated virus. b, Overlay of simultaneously recorded DA and Ach dynamics, and schematic of the injection strategy. Data are shown as in a (n = 6 mice). c, Confocal images of sensor expression in neurons of the VLS for a representative mouse recorded in b. DAPI is a nuclear marker. Scale bar, 10 µm. d, Covariance of DA and Ach signals from a 2ABT session in which DA lags Ach. Ach signals are compared to DA signals from another session (green) or randomly shifted signals from the same session (orange). The average covariance ± s.e.m. is depicted (n = 6 mice). e, Covariance of trial-segregated DA and Ach signals (left) and their noise (right) in which DA lags Ach by the indicated time. Data are shown as in d. The insets highlight the time offset of the minimum covariance signal. f, Full time-dependent covariance analysis of DA and Ach signals. The average signals ± s.e.m. are shown within the top and left subplots. An enlarged view of the outlined region in white is shown to the right with the time (s) indicated (n = 6 mice). g, Summary of the off-diagonal negative covariance calculated from the matrices in e. h, Photometry kernels produced by a GLM that incorporates behavioural, history and photometry variables. The mean kernels ± s.d. that predict Ach signals from rDAh signals (rDA to gAch) and DA signals from Ach3.0 signals (gAch to rDA) are shown (n = 6 mice).

Source Data

Simultaneous DA and Ach recordings within the same hemisphere (Fig. 2b) reveal that DA and Ach responses are highly anticorrelated with a positive time lag, which indicates that increases in DA might suppress Ach with a short delay (Fig. 2d)—a finding that is recapitulated by recordings of DA and Ach across separate hemispheres (Supplementary Information). To examine whether the relationship between DA and Ach varies across trial types, we analysed the cross-correlation between these signals. We observed that the trial-segregated DA and Ach signals are anticorrelated with a time offset of around 100 ms across rewarded and unrewarded trials (Fig. 2e and Extended Data Fig. 6f). Because correlations between signals might be driven by external factors such as behavioural events, we also examined the cross-correlation of the fluctuations about the trial-averaged means (that is, ‘noise correlations’) for rewarded and unrewarded trials (Extended Data Fig. 6e). This revealed a similar correlation structure (Fig. 2e), suggesting that direct interactions exist between Ach and DA release, with DA potentially inhibiting the release of Ach. We complement this analysis with a GLM that incorporates photometry as a predictive variable, and this reveals a similar negative interaction between DA and Ach (Fig. 2h and Supplementary Information).

Cross-covariance analysis, as presented above, assumes that the mean and variance of the signal do not change over time, but these can be dynamic during behaviour. To account for this, we performed a covariance analysis in which we calculate how variance about the mean of DA at one time point (t1) influences the variance in Ach at another time point (t2) (see Methods and Extended Data Fig. 6g). This results in a two-dimensional function, K(t1,t2), that describes the relationships between fluctuations in DA and Ach at specific times, such as entry into the side port. This revealed a strong time-lagged negative covariance (Fig. 2f), which we call the off-diagonal (Fig. 2g), showing that, at most time points, changes in DA precede changes in Ach by approximately 100 ms, which is consistent with prior analysis (Extended Data Fig. 6f). Notably, the negative, off-diagonal covariance nearly disappears when the mouse enters the side port (Fig. 2f, insets). This analysis highlights the dynamic and context-dependent relationship between DA and Ach within the trial and during the ITI. Phasic increases in DA typically inhibit the release of Ach, consistent with D2R-mediated suppression of CIN activity, but at specific moments, such as when the mouse enters the side port, this negative correlation is weakened.

Striatal DA dynamics do not require CINs

The DA transients we observe during the 2ABT could be driven by the release of Ach from CINs, by the activity of DANs or by a combination of both. To determine whether the release of Ach from CINs contributes to DA release, we blocked Ach release from CINs in the VLS by expressing tetanus toxin (TelC), which prevents the fusion of synaptic vesicles in these cells (Fig. 3a,b) and potently inhibits the release of Ach from CINs in vitro (Fig. 3c) and in vivo (Fig. 3d and Extended Data Fig. 7g–j). Owing to the large extent of DAN axon arborization, we reasoned that synchronized CIN activity across the striatum might be sufficient to drive DA release within the VLS where we record. Therefore, we perturbed CINs through TelC expression in a striatum-wide manner using a multisite injection approach (Fig. 3e and Extended Data Fig. 7l–o). This widespread loss of Ach induced severe behavioural defects and greatly altered behaviourally evoked DA signals (Fig. 3e and Extended Data Fig. 9f–i). The marked changes in behaviour underscore the importance of CINs in regulating striatal function; however, they make it difficult to interpret the effects of Ach loss on the reward-encoding properties of DA. Nevertheless, DA retained its capacity to encode for reward, such that DA signals (Fig. 3e) and their associated GLM kernels (Extended Data Fig. 7k) maintain opposing polarity with reward outcome. Thus, reward-encoding features of DA can persist despite severe loss of Ach.

Fig. 3: Ach does not regulate DA dynamics during decision-making.
figure 3

a, Schematic of TelC perturbation of Ach release. b, Epifluorescence images of TelC linked to mCherry (TelC–mCh) expressed in choline acetyltransferase (ChAT)-positive cells in the VLS (n = 12 mice). White arrowheads denote two CINs that coexpress ChAT and TelC. Scale bar, 50 µm. c, DA release as measured by carbon-fibre amperometry in an acute striatal slice containing CINs coexpressing Chrimson with mCherry (mCh) or TelC. Amperometry recordings (mean ± s.e.m.) are aligned to laser stimulation. d, Ach release in the VLS during rewarded 2ABT trials recorded with fibre photometry from mice with CINs expressing mCh or TelC in the VLS. The average ∆F/F0 of the sensor signal ± s.e.m. is shown (n = 3 mice). e, Injections and implantations (left) for fibre photometry to measure DA release (right) in the context of striatum-wide expression of mCh or TelC in CINs. Unilateral injections were performed in two separate cohorts of mice. The side entry (SE)-aligned average z-score of the sensor signal ± s.e.m. is shown (n = 5 mice per condition). f, Injections and implantations (left) for fibre photometry to measure DA release (right) in the context of VLS-selective and CIN-specific expression of TelC or mCh in separate hemispheres of the same brain (left). Data are shown as in e (n = 4 mice). g, Optogenetic inhibition of DAN cell bodies with simultaneous recordings of DA release in the VLS. Schematic of the injections and implantations (left), and summary of DA release during rewarded 2ABT trials from mice lacking opsin expression (middle) or expressing the inhibitory opsin stGtACR2 (right). The average ∆F/F0 of rDAh ± s.e.m. is indicated (n = 3 mice).

Source Data

Given that the proposed mechanism is local—CIN activity triggers DA release within a local DAN axon field—we tested whether DA release is affected by VLS-selective inhibition of Ach release. In mice performing the 2ABT, we inhibited the release of Ach in the VLS of one hemisphere using TelC, and compared DA release to that of the other hemisphere, in which VLS CINs express a control protein (Fig. 3f, left and Extended Data Fig. 8a–c). Mice did not exhibit behavioural deficits (Extended Data Fig. 9j–m), and VLS-specific loss of Ach release did not affect DA dynamics in trials recorded with dLight1.1 (Fig. 3f) or with rDAh, in which we simultaneously validated the suppression of Ach release (Extended Data Figs. 8g,i and 9a,b). Notably, phasic DA release remained the same during motivated approach behaviours (that is, around centre-port entry and immediately before side-port entry), which are proposed to be periods during which CINs could drive DA release, because DA levels and DAN activity are poorly correlated31. To address whether CINs might mediate discrepancies between DAN firing and DA release across longer timescales31, we parsed DA signals by the choice and reward outcome histories of one, two and three trials back (Extended Data Fig. 8d,h,j,k), but we did not observe any significant changes in DA dynamics or in the underlying GLM kernels of DA for each input feature after loss of Ach (Extended Data Fig. 8e). Finally, to address whether Ach loss lowers the overall magnitude of DA release throughout the trial, we analysed the amplitudes of DA sensor fluorescence transients (∆F/F0), but this analysis did not reveal consistent effects (Extended Data Fig. 8f).

Because the DA transients we observe during the 2ABT are not affected by local Ach release, DAN activity is likely to be the major driver of DA dynamics, not CINs. Indeed, inhibition of DAN activity robustly alters DA release, as evidenced by a significant reduction in the levels of DA after optogenetic inhibition of DANs with stGtACR2 (Fig. 3g and Extended Data Fig. 9c–e). Altogether, we find that loss of Ach release within the VLS does not impair DA dynamics. Although modulation of CIN activity is sufficient to drive DA release in vivo (Supplementary Information), the context in which it does so remains to be determined.

DA inhibits Ach release through D2Rs

During a trial, there are two periods in which opposite-signed changes in DA and Ach signals coincide and during which we hypothesize that D2Rs might mediate the depression of Ach: first, as the mice move from centre to side port; and second, during rewarded trials after side-port entry. To test this, we assayed whether optogenetic manipulations of DA neurons in vivo affect striatal Ach levels in a D2R-dependent manner. We increased and decreased the levels of DA in the VLS through photoactivation of DAN somas with excitatory (Chrimson) and inhibitory (stGtACR2) optogenetic proteins, in a head-fixed mouse on a wheel (Fig. 4a and Extended Data Fig. 10a–f). These manipulations altered the levels of Ach in the direction opposite to optogenetically evoked changes in DA levels, consistent with DA inhibiting Ach release (Fig. 4b,c). Notably, these effects are D2R-dependent as they are abolished by the administration of eticlopride, a D2R antagonist (Fig. 4b,c). Thus, changes in DA are sufficient to bidirectionally regulate the levels of Ach in vivo, consistent with basal engagement and dynamic modulation of DA-dependent inhibition of CINs.

Fig. 4: D2Rs are required for DA-mediated inhibition of Ach signals.
figure 4

a, Injection and fibre-implantation strategy for optogenetic manipulation of DANs and photometric recordings in the VLS (top). Eticlopride was applied during these recordings (bottom). b, Ach and DA release during Chrimson-mediated DAN activation with an intraperitoneal injection of saline (left) or eticlopride (right) before the recording. The average ∆F/F0 ± s.e.m. is shown (n = 3 mice). Data are aligned to laser stimulation onset. c, Ach and DA release during stGtACR2-mediated inhibition of DANs. Data are shown as in b (n = 3 mice). d, Injection and recording set-up to determine the effect of DA release on CIN firing in an acute striatal slice. ChR2-positive DAN afferents are activated with a laser pulse, and CIN firing is recorded with a cell-attached pipette. e, Representative single-cell responses to stimulation (orange) of ChR2-positive DAN afferents recorded from CINs with (WT) or without (Drd2-cKO) D2R expression. Each black vertical line denotes a CIN action potential. f, Population perievent spike histograms of average spike discharge ± s.e.m. for cells recorded in d (WT: n = 9 cells; Drd2-cKO: n = 9 cells). g, Trial-averaged Ach and DA recordings aligned to side-port entry (SE). Average signals ± s.e.m. are depicted (WT: n = 12 mice; Drd2 f/f: n = 7 mice; Drd2-cKO: n = 8 mice). The changes in Ach signal that occur when the mice move from centre to side port (orange arrow) and after side-port entry (green arrow) are shown. h, Covariance of DA and Ach dynamics in mice shown in g, in which DA lags Ach by the indicated time. The average covariance of trial-segregated signals (top) and their noise (bottom) ± s.e.m. is shown. i, Moment-to-moment covariance matrices for mice in g of DA and Ach release with their respective off-diagonal signals ± s.e.m.

Source Data

Because D2Rs are expressed by other cell types in the brain and D2R blockade has major behavioural effects that prevent mice from performing the task45, we used an alternative method to determine whether DA suppresses the release of Ach during the task. We used a genetic strategy to knock out (KO) D2Rs specifically in CINs, and we refer to this transgenic mouse line (ChAT-IRES-Cre; Drd2f/f) as Drd2-cKO. To confirm the functional loss of D2Rs in CINs, we compared the ability of DA to reduce CIN firing in striatal slices from wild-type versus Drd2-cKO mice (Fig. 4d). In wild-type mice, release of DA after laser stimulation of channelrhodopsin-expressing DAN terminals robustly reduced CIN firing, as measured by cell-attached recordings, but this effect was absent in CINs from Drd2-cKO mice (Fig. 4e,f).

To determine how D2R loss in CINs affects the release of Ach during the 2ABT, we compared neuromodulator dynamics in the VLS of Drd2-cKO CIN mice with that in two control groups: ChAT-IRES-Cre mice (referred to as wild type), and Drd2-floxed mice (referred to as Drd2 f/f) (Extended Data Fig. 10g,h). We found that the loss of D2Rs in CINs abolished both instances of Ach suppression that coincide with an increase in DA levels (Fig. 4g). Together, these changes lead to significantly increased Ach signals (Extended Data Fig. 11d). Modelling these signals with the history GLM recapitulates these effects (Supplementary Information). Of note, these changes occurred despite no significant changes in DA dynamics during the task across the three genotypes (Fig. 4g and Extended Data Fig. 11d), which, in turn, provides further support for the fact that Ach release has little or no effect on DA signals. Thus, D2Rs are required for DA to inhibit Ach release in vivo during precise moments within a trial.

To determine whether the loss of D2Rs in CINs affects decision-making, we assessed the performance of Drd2-cKO mice in the 2ABT. Although general performance metrics, block transition dynamics and RFLR coefficients (Extended Data Fig. 11f–i) are comparable between Drd2-cKO mice and both control groups, differences emerge when performance is parsed by history. Drd2-cKO mice are impaired in their ability to switch selection ports across some histories when compared to both Drd2 f/f and wild-type cohorts (Extended Data Fig. 11j). This supports a role for D2R-dependent reductions in Ach release in promoting complex changes in behaviour. In conclusion, we find that D2Rs are required for DA to repress Ach signals during precise moments within a trial, and loss of this regulation impairs the normal switching behaviour of mice.

The cortex and thalamus drive Ach release

Although DA shapes Ach signals during decision-making, we observe additional fluctuations in Ach that are independent of DA. During unrewarded trials, Ach signals remain repressed after side-port entry, even in Drd2-cKO mice (Fig. 4g), and extra inputs are required to drive the increases in Ach that occur upon side-port entry and during the consumption period. Finally, the momentary disruption of the negative covariance between Ach and DA signals points to the existence of other factors that can independently alter Ach and DA dynamics (Fig. 2f,g).

To discover other potential sources of regulation of striatal Ach, we examined inputs to the striatum from the cortex and the thalamus, both of which synapse onto CINs and modulate their firing rates34,35,46. To determine whether these regions project to the VLS, we performed retrograde tracing with cholera toxin. We find that a broad distribution of cells from multiple cortical regions send afferents into the VLS (Fig. 5a). Meanwhile, thalamic inputs into this striatal region originate predominantly from the parafascicular nucleus, consistent with previous observations47.

Fig. 5: Corticostriatal and thalamostriatal inputs are necessary to drive Ach signals in the VLS.
figure 5

a, Retrograde labelling of cortex and thalamus with cholera toxin (Ctb) injected into the VLS. Images of the injection site (left) and representative coronal sections that depict Ctb-positive cortical and thalamic cell bodies (right) are shown (n = 3 mice). Scale bars: white, 1 mm; green, 200 µm. b, Glutamate release from 2ABT trials measured with iGluSnFR. Average signals ± s.e.m. are shown (n = 4 mice). Contra, contraversive; ipsi, ipsiversive. c, Overlay of glutamate (glut) and Ach release recorded from separate mice during the indicated trial types. Data are presented as in b (glutamate: n = 4 mice; Ach: n = 9 mice). d, Injection and fibre-implantation strategy for photometry of calcium dynamics in thalamic and cortical terminals in the VLS. Data are shown as in b (glutamate: n = 4 mice; thalamic calcium: n = 6 mice; cortical calcium: n = 5 mice). e, Injection and fibre-implantation strategy for TelC or mCh expression in the cortex with Ach recordings in the VLS (top). ∆F/F0 (middle) and z-scored signals (bottom) of the average Ach release from the indicated treatment groups. Data are depicted as in (b) (mCh: n = 4 mice; TelC: n = 5 mice). f, Injection and fibre-implantation strategy for TelC or mCh expression in the thalamus with Ach recordings in the VLS (top). ∆F/F0 (middle) and z-scored signals (bottom) of the mean Ach release from the indicated treatment groups. Data are shown as in b (mCh: n = 3 mice; TelC: n = 5 mice). g, Schematic summarizing findings. Trial initiation evokes the release of multiple neurotransmitters in the VLS, all of which interact and influence decision-making. Glutamate release from cortex and thalamus are necessary to promote Ach release (orange box), while DA release inhibits it at specific trial moments through D2Rs (purple boxes). Altogether, this guides future actions.

Source Data

To assess the potential influence of these glutamatergic inputs on striatal Ach, we first determined whether striatal glutamate and Ach signals are correlated. Using the glutamate sensor iGluSnFR48 (Extended Data Fig. 12a,b), we find that striatal glutamate levels vary during the task but are not lateralized (Fig. 5b), allowing us to combine signals from ipsiversive and contraversive trials. Glutamate signals are suppressed before the choice and increase during side-port entry in an analogous manner to Ach (Fig. 5c). In rewarded trials, glutamate exhibits an extra phase of sustained increase during consumption, which is absent in unrewarded trials (Fig. 5c). The activities of cortical and thalamic terminals in the VLS, measured with genetically encoded calcium sensors, coincide with glutamate release across both trial types, suggesting that both inputs can contribute to the release of glutamate in this region (Fig. 5d and Extended Data Fig. 12c–f). Altogether, these data show substantial coincident dynamics from both cortical and thalamic inputs into the striatum, providing a basis for the possibility that these glutamatergic inputs drive changes in Ach levels.

To test whether each input is required for Ach release, we expressed TelC unilaterally in the thalamus or cortex (Fig. 5e,f and Extended Data Figs. 12g–j and 13a–d). Reflecting the importance of cortical and thalamic inputs in regulating reward-based decision-making, both perturbations impaired multiple aspects of the performance of mice in the 2ABT, including impaired switch dynamics after rewarded trials and block transitions, across multiple choice-outcome histories (Extended Data Fig. 13f–m). Consistent with this impairment, the RFLR model description revealed a reduction in β and an increase in τ, reflecting a weakened incorporation of the weight given to previous evidence and a faster rate of information decay, respectively (Extended Data Fig. 13i,m).

Loss of neurotransmission from each region robustly dampened Ach transients across all trials, as seen in the lowered ∆F/F0 of the Ach sensor signal (Fig. 5e,f, bottom). The degree of suppression of ∆F/F0 was strong and consistent across mice and sufficient to overcome any underlying variability in signals (Extended Data Fig. 13e), unlike the effects of CIN perturbation of DA levels. Analysis of the remaining Ach transients reveals unique ways that cortical and thalamic inputs modulate Ach levels. In unrewarded trials, loss of cortical but not thalamic inputs perturbs Ach transients after side-port entry, suggesting that the cortex has a specific role in driving this signal. In addition, loss of the thalamic input shifts the timing of Ach transients more than does loss of the cortical input, which might reflect the greater degree of behavioural disruption in mice with thalamic TelC injections. Overall, our results reveal that both the cortex and the thalamus are required to sustain the levels of Ach during decision-making, and that each input can uniquely alter the dynamics of striatal Ach during a trial.

Discussion

DA and Ach are crucial neuromodulators that directly affect each other’s release in vitro in the striatum. However, whether these interactions regulate neuromodulator levels in vivo, particularly during decision-making, is largely unknown. To address this, we evaluated how striatal DA and Ach dynamics are regulated by the proposed bidirectional circuit during a task that requires mice to make choices flexibly within a changing environment. We revealed that DA and Ach signals are generally anticorrelated across time, but that this relationship is dynamic and modulated by action-outcome history. Although striatal Ach release does not modulate DA dynamics during the 2ABT, DA exerts a key influence on Ach signals through D2Rs. Without this interaction, the ability of action and reward history to influence decision-making is diminished. As well as the inhibition of Ach release by DA, cortical and thalamic inputs concurrently drive the release of Ach and contribute to both basal Ach levels and reward-outcome-dependent transients. In conclusion, by using a diverse toolset to interrogate and alter neuromodulator levels during a complex behavioural task, we establish a precise in vivo role for a long-defined in vitro circuit and reveal new modes of CIN regulation by dopaminergic and glutamatergic inputs (Fig. 5g). Moreover, our findings provide a framework for further studies, with which we can gain a deeper understanding of the neurochemical basis of decision-making and behaviour (Supplementary Information).

Methods

Mice

The following mouse lines were used: C57BL6/J (The Jackson Laboratory, 000664); ChAT-IRES-Cre (The Jackson Laboratory, 006410); DAT-IRES-Cre (The Jackson Laboratory, 006660), Drd2loxP (The Jackson Laboratory, 020631); Vglut2-IRES-Cre (The Jackson Laboratory, 028863); and Vglut1-IRES-Cre (The Jackson Laboratory, 023527). All mice were bred on a C57BL/6J genetic background and heterozygotes were used unless noted. For behaviour experiments, 6–8-week-old male mice were used. For all experiments, a sample size of at least 3 was chosen in a manner that was not guided by a statistical test. No randomization or blinding was performed. All animal care and experimental manipulations were performed in accordance with protocols approved by the Harvard Standing Committee on Animal Care, following guidelines described in the US NIH Guide for the Care and Use of Laboratory Animals.

Intracranial injections

Mice were anaesthetized with 5% isoflurane and maintained under surgery with 1.5% isoflurane and 0.08% O2. Under the stereotaxic frame (David Kopf Instruments), the skull was exposed in aseptic conditions, a small craniotomy (around 300 μm) was drilled and the virus (Supplementary Information) was injected into the following regions with the associated coordinates listed from bregma: VLS (coordinates: 0.6 mm A/P, ±2.3 mm M/L and 3.2 mm D/V); SNc and VTA (coordinates: −3.35 mm A/P, ±1.75 mm M/L and 4.3 mm D/V); thalamus (coordinates: −2.1 mm A/P, ±1.0 mm M/L and 3.5 mm D/V); prefrontal cortex (PFC; coordinates: 2.0 mm A/P, ±0.4 mm M/L and 2.3 mm D/V).

Injections were performed as previously described49. A pulled glass pipette was held in the brain for 3 min, and viruses were infused at a rate of 50 nl min−1 (VLS), 30-40 nl min−1 (PFC) and 70 nl min−1 (SNc and VTA) with a syringe pump (Harvard Apparatus, 883015). Pipettes were slowly withdrawn (less than 10 µm s−1) at least 6 min after the end of the infusion, and 350 nl was infused per injection site except for Ctb 555 injections (50 nl at 4 µg µl−1).

For AAV injections, the wound was sutured. For fibre implants, after AAV injection, the skull was scored lightly with a razor blade to promote glue adhesion. Then, a 200-µm blunt-ended fibre (MFC_200/230-0.48_4 mm, Doric Lenses) was slowly inserted into the brain until it reached 100 µm above the injection site. The fibre was held in place with glue (Loctite gel, 454) and hardening was accelerated with the application of Zip Kicker (Pacer Technology). A metal headplate was glued at lambda and white cement (Parkell) was applied on top of the glue to further secure the headplate and fibres. Fibre implants were protected with a removable plastic cap (Doric Lenses) until recordings.

After the surgery, mice were placed in a cage with a heating pad until their activity was recovered, before returning to their home cage. Mice were given pre- and post-operative oral carprofen (CPF, 5 mg per kg per day) as an analgesic and monitored daily for at least four days after surgery. At least four weeks passed after virus injection before experiments were performed, except for retrograde tracer injections, in which one week passed. Of note, to detect thalamic activity, we injected jRCaMP1b in the somas and recorded from thalamic terminals in the VLS (Fig. 5d and Extended Data Fig. 12c,d). Meanwhile, a multisite injection strategy was required for cortical inputs given their widespread distribution (Fig. 5d, bottom left and Extended Data Fig. 12e,f). In addition, we found that a brighter calcium sensor, GCaMP8, was necessary for the detection of cortical signals arising from these dispersed sources. For cortical inputs, we used a retrograde AAV approach in Vglut1-IRES-Cre mice to restrict expression of the toxin to cells that project into the VLS (Fig. 5e and Extended Data Fig. 13a–d). However, for thalamic inputs, we could not use a retrograde approach in Vglut2-IRES-Cre mice owing to the expression of Vglut2 (also known as Slc17a6) in the cortex (Allen Institute); therefore, we instead injected TelC directly into the thalamus of Vglut2-IRES-Cre mice (Fig. 5f and Extended Data Fig. 12g–j).

Immunohistochemistry

Mice were anaesthetized by isoflurane inhalation and transcardially perfused with phosphate-buffered saline (PBS) followed by 4% paraformaldehyde (PFA) in PBS. Brains were extracted and stored in 4% PFA PBS for at least 8 h or in 4% PFA, 0.02% sodium azide and PBS for long-term storage at 4 °C. The right hemisphere of the brain was slightly slit with a razor to enable accurate identification of the hemispheres once the brains were sliced. Brains were sliced into 70-μm-thick free-floating sections with a Leica VT1000 S vibratome. Selected slices were transferred to a six-well plate and rinsed three times for 5 min each in PBS. They were then blocked with rotation at room temperature for an hour in blocking buffer (5% normal goat serum (Abcam), 0.2% Triton X-100 PBS). The blocking buffer was removed and replaced with 500–700 μl of a solution containing the indicated primary antibody (Supplementary Information). Slices were incubated overnight with side-to-side rotation at 4 °C. The next day, slices were transferred to a clean well and washed five times for 5 min each in PBST (PBS with 0.2% Triton X-100). After the final wash, slices were incubated for 1.5 h in 500–700 μl of the indicated secondary antibody diluted 1:500 in blocking buffer. Slices were washed four times in PBST for 5 min each, then four times in PBS for 5 min each before mounting with ProLong Diamond Antifade Mountant with DAPI (Thermo Fisher Scientific). Slices were imaged with an Olympus VS120 slide scanning microscope or a spinning disk confocal microscope.

Behaviour apparatus, training and task

The apparatus used for the behaviour is as described previously40 with the following modifications. Clear acrylic barriers 5.5 cm in length were installed in between the centre and side ports before training, to extend the trial time and to help produce better-resolved photometry recordings. Water was delivered in 3-μl increments. Hardware and software to control the behaviour box are available online: https://github.com/HMS-RIC/TwoArmedBandit.

Singly housed male mice were restricted to 1 ml water per day before training and were maintained at at least 80% of their initial body weight for the full duration of training and photometry. All training sessions were conducted in the dark under red light conditions. A blue LED above the centre port signals to the mouse to initiate a trial by poking in the centre port. Blue LEDs above the side ports are then activated, signalling the mouse to poke in the left or right port within 5 s. At any given instance, only one side port rewards water. Reward probabilities are defined by custom software (MATLAB). Withdrawal from the side port ends the trial and begins a 1-s ITI, after which the mouse can self-initiate the next trial. An expert mouse can perform 200–300 trials in a session.

To train the mice to proficiency, they were subjected to incremental training stages. Each training session lasted for 30–60 min, adjusted according to the mouse’s performance. Mice progressed to the next stage once they were able to complete at least 100 successful trials with a reward rate of at least 75%. On the first day, they were habituated to the behaviour box, with water being delivered from both side ports and triggered only by a side-port poke. In the next stage, mice learned the trial structure—only a poke in the centre port followed by a poke in the side port delivers water. Then, the mice transitioned to learning the block structure, in which 30 successful trials on one side port triggers a deterministically rewarded port (Phigh = 100%) to switch to the other side port. Finally, mice performed trials in the presence of barriers in between the centre and the side ports. A series of transparent barriers of increasing size (extra-small (1.5 cm), small (3 cm), medium (4 cm) and long (5.5 cm)) aided in learning. Finally, the mice were trained on probabilistic reward delivery (Phigh = 95%). Once the mice were proficient, optical fibres were implanted into their brains.

After fibre-implant surgeries, mice were retrained to achieve the same pre-surgery performance level. Habituation to head fixation on a wheel followed by habituation to attachment of a mock photometry patchcord was performed over successive days for each mouse. Head fixation was done to temporarily restrain the mice to make it easier to attach and secure the patchcord for stable photometry recordings. Recordings were performed four weeks after surgery to allow for stable viral expression levels as well as a consistent and proficient level of task performance from the mice. In experiments in which the LED cue is omitted (‘cue-omission’ trials), we turned off the LEDs located above the centre and side ports but left all other task parameters and recording conditions unchanged.

Photometry and behaviour recordings

Fibre implants on the mice were connected to a 0.48 NA patchcord (Doric Lenses, MFP_200/220/900-0.48_2m_FCM-MF1.25, low autofluorescence epoxy), which received excitation light and propagated its emission light to a Dorics filter cube (blue excitation light (465–480 nm); red excitation light (555–570 nm); green emission light (500–540 nm); red emission light (580–680 nm) (FMC5_E1(465-480)_F1(500-540) _E2(555-570)_F2(580-680)_S, Doric Lenses)). Excitation light originated from LED drivers (Thorlabs) and was amplitude-modulated at 167 Hz (470-nm excitation light, M470F3, Thorlabs; LED driver LEDD1B, Thorlabs) and 223 Hz (565-nm excitation light, M565F3, Thorlabs; LED driver LEDD1B, Thorlabs) using MATLAB. The following excitation light powers were used for the indicated sensors: dLight1.1 (25 µW); Ach3.0 (25 µW); rDAh (45 µW); and iGluSNFr (15 µW). Signals from the photodetectors were amplified in DC mode with Newport photodetectors or Dorics amplifiers and received by a Labjack (T7) streaming at 2,000 Hz. The Labjack also received synchronous information about behaviour events logged from the Arduino, which controls the behaviour box. The following events were recorded: centre-port entry and exit, side-port entry and exit, lick onset and offset, and LED-light onset and offset. Photometric recordings and behaviour performance were analysed as described (Supplementary Information).

Optogenetic manipulations

All optogenetic stimulations were triggered by side-port entry and persisted for a set time duration that was adjusted for the average side-port occupancy of the mice in each experimental cohort. We used a stimulation duration that would not persist past the side-port entry and introduce ectopic effects on the next trial. For optogenetic stimulations with Chrimson during behaviour (Extended Data Fig. 7b), 15 mW of a 590-nm laser (Optoengine) was evoked in 25% of trials for 1.5 s interleaved throughout the session. The excitation light was delivered via the Doric filter cube, which led to a laser stimulation artefact, which is removed in the recordings. Only one hemisphere was illuminated in each session. For optogenetic manipulation of DANs (Figs. 3g and 4b,c), 15 mW of a 590-nm laser was used for Chrimson whereas 0.7 mW of a 463-nm laser was used for stGtACR2 stimulations, each for a duration of 5 s. For optogenetic stimulations of head-fixed mice on a wheel, in each session, the laser excitation duration was 1.5 s, with a 45-s ITI, repeated 20 times. The signals displayed are averages of each session (Fig. 4b,c and Extended Data Fig. 7a,e). The photometry signal baseline was calculated by averaging the signal 1.5 s before laser stimulation across the 20 sweeps.

GLM

Photometry recordings and behavioural data used for GLMs were collected from the indicated mice, with 3–6 sessions per mouse and approximately 150–300 trials per session, of which typically more than 75% are rewarded. These data were aligned to behavioural events (see ‘Signal demodulation’ in Supplementary Information) to create a predictive matrix X (of dimensions N × F) and a response vector, y (of dimension N), where N is the number of time steps recorded in the session and F is the number of predictors in the analysis. Except for instances in which photometry variables were used as predictors, the GLM features consisted of values 0 and 1 to indicate if a behavioural event (for example, a lick) occurred in the time bin.

For each predictive matrix, a design matrix φ(X) (of dimensions N × F (2T + 1)) was constructed from T time shifts forward and backward (T = 20, 54 ms each), resulting in GLM coefficients that corresponded to time-based kernels for each of the predictive features in X. Data from the ITI period, in which there are no task-relevant behavioural events, were excluded, and only data spanning shortly before centre entry and after side-port exit were modelled. When initial and final time shifts spanned the boundary between two trials, the overlapped data were included twice–once in each of the trials on either side of the boundary) to ensure sufficient representation of each event in training, validation and test datasets. Because of the variability in the ITIs, this duplication resulted in around 1.5% to around 17.3% of the data points being present in both the training and the test datasets.

To evaluate the performance of the GLMs, trials were partitioned into training and test datasets, each containing 50% of the data. For the results shown in Extended Data Fig. 5e,i, multiple model runs were carried out, with the number of repetitions designated Y in this paragraph. For each run, the data were split into training and test datasets and were held constant for all the models tested in that run. Y = 10 for the leave-out analysis (Extended Data Fig. 5f) and Y = 3 for the hyperparameter analysis (Extended Data Fig. 5e,i). For each model run, a 10-fold group shuffle split (GSS) by trial was applied to the training set to obtain cross-validated ranges for the MSEs, based on an 80–20 training–validation split within each of the 10 GSS folds. Each validation MSE value in the box plots is the average of the concatenated squared residuals across all validation data points in these 10 GSS folds. Finally, the model was refit to and evaluated on the entire training dataset, and this refit model was evaluated on the test dataset, resulting in the training and test MSEs and R2 values for each model run. The R2 values presented in the text are the average values calculated from the test sets averaged across Y model runs. Typically, these values had small variance, with ranges from maximum to minimum of less than 1.2%. Therefore, the ranges are not stated in the text.

For each of the models used, the algorithms minimize an associated cost function with respect to the fitted coefficients. The cost functions are as follows, where J is the cost function to be minimized, X is the design matrix (set of time-shifted tasks or behavioural events), y is the response vector (fluorescence indicator), β is the set of fitted coefficients, \({\parallel \,a\parallel }_{2}^{2}\) is the sum of the squared entries in vector a, \({\parallel a\parallel }_{1}\) is the sum of the absolute values of the entries in vector a, α is the regularization parameter and λ is the L1 ratio.

Ordinary least squares (OLS):

$$J(X,y)={\parallel \,y-X\beta \,\parallel }_{2}^{2}$$

Ridge regression (L2):

$$J(X,y)={\parallel \,y-X\beta \,\parallel }_{2}^{2}+\alpha {\parallel \,\beta \,\parallel }_{2}^{2}$$

Elastic net and lasso regression (L1):

$$J(X,y)=\frac{1}{2N}{\parallel \,y-X\beta \parallel }_{2}^{2}+\alpha \left(\lambda {\parallel \beta \parallel }_{1}+\frac{1}{2}(1-\lambda ){\parallel \beta \parallel }_{2}^{2}\right)$$

Note that for OLS, α = 0 as there is no regularization. Furthermore, setting λ = 1 yields lasso regression (L1 regularization). However, setting λ = 0 does not give an equation equivalent to the version of ridge regression provided above, resulting in two different α scales (Extended Data Fig. 5e,i). In addition, for L2 regularization, the validation-based models were fit to 80% of the total of samples available to the final model; thus, the validation models performed worse than their training or test counterparts because they are, in effect, facing an increased amount of regularization.

The sources for the least squares regression models are listed below:

OLS: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html.

L2: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html.

L1 and elastic net: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html.

All kernels (β coefficients) depicted are the mean coefficients across the Y model runs with one standard deviation above and below the mean represented in the shaded regions. All GLM reconstructions depict the average signal with an overlay of the bootstrapped 95% confidence intervals as the upper and lower bounds (shaded region).

Preparation of acute brain slices

Brain slices were obtained from two- to four-month-old mice (both male and female) using standard techniques. Mice were anaesthetized by isoflurane inhalation and subjected to cardiac perfusion with ice-cold artificial cerebrospinal fluid (ACSF) containing 125 mM NaCl, 2.5 mM KCl, 25 mM NaHCO3, 2 mM CaCl2, 1 mM MgCl2, 1.25 mM NaH2PO4 and 25 mM glucose (295 mOsm kg−1). Brains were blocked and transferred into a slicing chamber containing ice-cold ACSF. Sagittal slices of striatum for amperometric or cell-attached recordings were cut at 300 μm thickness with a Leica VT1000 S vibratome in ice-cold ACSF, transferred for 10 min to a holding chamber containing choline-based solution (consisting of 110 mM choline chloride, 25 mM NaHCO3, 2.5 mM KCl, 7 mM MgCl2, 0.5 mM CaCl2, 1.25 mM NaH2PO4, 25 mM glucose, 11.6 mM ascorbic acid and 3.1 mM pyruvic acid) at 34 °C, then transferred to a secondary holding chamber containing ACSF at 34 °C for 10 min and subsequently maintained at room temperature (20–22 °C) until use. All recordings were obtained within 4 h of slicing. Both choline solution and ACSF were constantly bubbled with 95% O2/5% CO2.

Cell-attached recordings

Acute sagittal brain slices and electrophysiological recordings were obtained from the dorsal striatum as described before50, with the following variations: CINs were identified using morphological and electrophysiological features14. Slices were sustained in ACSF with 10 µM of gabazine, CPP and NBQX (Tocris). For cell-attached recordings, bath temperatures for the acute slice recordings were maintained at 34 °C, pipettes were filled with ACSF, had 1–2 MΩ resistance, seal resistances were from 10 to 100 MΩ. Action potential firing was monitored in the cell-attached recording configuration in the voltage-clamp mode (Vhold = 0 mV). ChR2 was activated by a single 2-ms pulse of 473-nm light delivered at 5.74 mW using full-field illumination through the objective at 120-s intervals.

Amperometry recordings

Slices were stimulated with 593-nm light, delivered at 5.86 mW for 2 ms using full-field illumination through the objective at 180-s intervals. Constant-potential amperometry was performed as previously described50. In brief, glass-encased carbon-fibre microelectrodes (CFE1011 from Kation Scientific: 7 μm diameter, 100 μm length) were placed approximately 50–100 μm within dorsal striatum slices and held at a constant voltage of +600 mV for 9 s versus Ag/AgCl by a Multiclamp 700B amplifier (Molecular Devices). Electrodes were calibrated with fresh 5 μM dopamine standards in ACSF to determine the sensitivity of the carbon-fibre microelectrodes and to allow conversion of current amplitude to extracellular dopamine concentration.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.