Bridging the gap between striatal plasticity and learning

The striatum, the main input nucleus of the basal ganglia, controls goal-directed behavior and procedural learning. Striatal projection neurons integrate glutamatergic inputs from cortex and thalamus together with neuromodulatory systems, and are subjected to plasticity. Striatal projection neurons exhibit bidirectional plasticity (LTP and LTD) when exposed to Hebbian paradigms. Importantly, correlative and even causal links between procedural learning and striatal plasticity have recently been shown. This short review summarizes the current view on striatal plasticity (with a focus on spike-timing-dependent plasticity), recent studies aiming at bridging in vivo skill acquisition and striatal plasticity, the temporal credit-assignment problem, and the gaps that remain to be filled.


Bridging the gap between striatal plasticity and learning
Elodie Perrin 1,2 and Laurent Venance 1,2 The striatum, the main input nucleus of the basal ganglia, controls goal-directed behavior and procedural learning. Striatal projection neurons integrate glutamatergic inputs from cortex and thalamus together with neuromodulatory systems, and are subjected to plasticity. Striatal projection neurons exhibit bidirectional plasticity (LTP and LTD) when exposed to Hebbian paradigms. Importantly, correlative and even causal links between procedural learning and striatal plasticity have recently been shown. This short review summarizes the current view on striatal plasticity (with a focus on spike-timing-dependent plasticity), recent studies aiming at bridging in vivo skill acquisition and striatal plasticity, the temporal credit-assignment problem, and the gaps that remain to be filled.

Introduction
The striatum receives topographic glutamatergic afferents from all cortical areas and from some thalamic nuclei [1] (Figure 1). It is an important site for action selection and procedural memory formation [2]. Since the demonstration by Yin and coll. [3 ] of striatal plasticity following acquisition of a procedural skill, several studies have extended this pioneering work by assessing striatal plasticity across various learning tasks. This review aims at giving the current view on ex vivo striatal plasticity in the light of recent studies evidencing correlative or causal link between in vivo learning and striatal plasticity, in a physiological context. Here, 'ex vivo' refers to brain slice recordings from animals subjected to training or treatment, as opposed to studies in which brain slices are examined in naïve animals to reveal plasticity mechanisms. Striatal plasticity has been a controversial field for at least two decades because of its great variety of results (reviewed in Refs. [4][5][6][7][8]) and the rise of back-andforth investigations between in vivo and ex vivo bring a unique opportunity for a better understanding of striatal plasticity, and most importantly for bridging the gap between learning and striatal plasticity.

Striatal complexity
Three main reasons account for the diversity of results concerning striatal plasticity: the induction protocols (rate-coded versus time-coded and Hebbian versus non-Hebbian), the striatal heterogeneity, and some technical issues. Some critical technical issues are the age of the animals, the slice orientation (coronal versus sagittal versus horizontal), the location of the stimulation electrode (cortex versus corpus callosum versus striatum) and the rate of the extracellular and intracellular component washout (LTP being optimally observed under sufficient rates of superfusion and high resistance whole-cell recordings). Intermingled anatomo-functional compartments and neuronal units constitute the basis of the striatal heterogeneity: dorsolateral and dorsomedial striatum (DLS and DMS), and direct and indirect trans-striatal pathways, just to cite the main ones which can be assessed during recordings (Figure 1). DMS and DLS receive inputs from associative and sensorimotor cortices and encode for goal-directed behavior and skill acquisition, respectively [3 ,9]. In rodents, striatal projection neurons (SPNs) belong either to the direct (d-SPNs) or indirect (i-SPNs) trans-striatal pathways and show distinct dopaminergic receptor expression, D 1 -class and D 2 -class receptors, respectively [10]. Recent studies show that d-SPNs and i-SPNs are engaged in a complementary and coordinated manner for action initiation and execution [11][12][13]. DMS/DLS and d-SPNs/i-SPNs are distinguished in the majority of the plasticity studies. Nevertheless, the third level of striatal structuration, the striosomes (patch)/ matrix compartments [14], remains to be more documented for striatal plasticity expression. Another compartment has been recently added, the annular compartments, surrounding the striosomes [15] (Figure 1). Functionally, substance P increases dopamine release within the striosomes but decreases it in the annular compartment, and leaves dopamine unmodified in the matrix [15,16] suggesting distinct neuromodulation of striatal plasticity among these compartments. links between learning and striatal plasticity, and then from these studies we discuss the conditions of emergence of bidirectional striatal plasticity.

From learning to striatal plasticity
Striatal plasticity has been assessed during goal-directed behavior, and across the early and late phases of procedural learning. The analysis of various parameters, used as proxies for synaptic plasticity, has been achieved either in vivo during behavioral tasks (analysis of the firing rate and activity coherence [9,13,17,18,19 ,20]; measurement of opto-induced LFP [21 ]), or ex vivo after behavioral training (NMDAR/AMPAR ratio [3 ,22 ,23]; spontaneous-EPSCs: [24]; saturation/occlusion plasticity tests [25 ,26]) ( Figure 2). The link between the acquisition/ consolidation of procedural learning and striatal plasticity was first shown by the combined analysis of in vivo firing rate and ex vivo NMDAR/AMPAR ratio from mice subjected to an accelerating rotarod [3 ]. In vivo analysis shows that DMS, but not DLS, displays increased activity during the early phases of skill acquisition whereas the reverse picture is obtained during the consolidation phases, that is DLS displays increased firing activity while DMS is back to naïve levels. Interestingly, NMDAR/ AMPAR ratio varies only in DLS for the consolidation phase [3 ], pointing to the non-NMDAR nature of the corticostriatal plasticity in DMS for the early phases. Ex vivo saturation/occlusion experiments after extended training show LTP at i-SPNs but not at d-SPNs, suggesting that LTP is induced at d-SPNs for the consolidation phase [3 ]. Ex vivo AMPAR/NMDAR ratio analysis revealed that during T-maze task, LTP is engaged (but not LTD) in DMS in the early phase while LTD (but not LTP) is involved in the late phase, whereas in DLS, LTD is involved only in the late phase (but LTP in DLS was not explored) [25 ]. After habit learning using the lever-pressing task (corresponding to the late phase described in [3 ,25 ]), ex vivo spontaneous-EPSC are specifically decreased in DLS i-SPNs (indicative of a postsynaptic LTD) [24]. In a serial order task, learning Bridging the gap between striatal plasticity and learning Perrin and Venance 105   Schematic representation of the striatal heterogeneity and the anatomo-functional compartments of the dorsal striatum. Schematic representation of the direct and indirect trans-striatal pathways of the basal ganglia. Striosomes are shown with black dots distributed between the dorsolateral striatum (blue) and the dorsomedial striatum (orange). Grouped black dots represent striosomes surrounded by the annular compartment (red line, [15]), whereas isolated black dots illustrate the exo-patch [14]. Striosomal SPNs mainly project to SNc whereas SPNs from the matrix belong to the direct or indirect pathway, identified respectively by the expression of D1 and D2 receptors. The direct and indirect pathways are represented, respectively, in green and purple. GPe, external segment of the globus pallidus; EP, entopeduncular nucleus; STN: subthalamic nucleus; SNr, substantia nigra pars reticulata; SNc, substantia nigra pars compacta.
of a precise sequence depends on d-SPNs in the DLS and induces an increase of the AMPAR/NMDAR ratio at their synapses (but not at i-SPNs) [23]. In a goal-directed task, ex vivo AMPAR/NMDAR ratio analysis reveals opposing plasticity in d-SPNs and i-SPNs in DMS, while no modification is observed in DLS [22 ]. During the learning of a sensory discrimination task, LTP is detected in vivo and ex vivo in the auditory striatum [21 ].
Learning abstract routines, such as neuroprosthetic skills, requires corticostriatal plasticity as revealed by in vivo firing coherence between motor cortex and DLS [9,19 ,20]. Firing rate in DLS and coherence in the theta band between motor (M1) cortex and DLS increases in the late phase of volitional modulation of M1 activity [9,19 ]; these phenomena are NMDAR-mediated since no change occurs in NMDAR-knock-out mice [19 ]. In addition, firing activity patterns of neuronal ensembles that trigger maximal dopamine release along trials of the neuroprosthetic task are selected and progressively shaped for optimized reinforcement learning [20].
It remains to get the full picture of the plasticities successively engaged in d-SPNs and i-SPNs in DLS and DMS from goal-directed to habit formation.

From striatal plasticity to learning
A reverse strategy consists in triggering LTP or LTD during a behavioral task to investigate causality between synaptic plasticity and behavioral modifications. NMDAR-LTP and endocannabinoid-LTD were induced by presynaptic optogenetic-stimulation associated with optogenetic SPN depolarization in DMS in operant alcohol self-administration [27 ]. LTP and LTD induction promote respectively a long-lasting increase and decrease in alcohol-seeking behavior [27 ]. This demonstrates a causal link between the polarity of an induced plasticity and its effect on behavior. Moreover, this strategy allows to test the plasticities identified in brain slices in SPNs or interneurons, and to investigate their in vivo impact on synaptic transmission [21 ] or behavior [27 ].

Emergence of bidirectional striatal plasticity: Hebbian mode is the key
The observation that LTP can be induced in vivo using a Hebbian paradigm changed the view of a LTD dominance in the striatum [28][29][30]. Since then, numerous studies have reported LTP (as well as LTD) depending on the stimulation protocol (for reviews see Refs. [4][5][6][7][8]). Nevertheless, striatal LTP still appears more capricious to induce than LTD. Interestingly, following in vivo learning tasks, LTP is systematically detected [3 ,21 ,23,25 ,26]. Obviously, the induction phase matters and LTP appears more likely induced upon Hebbian protocols. Hebbian plasticity relies on the quasi-coincident activity on either side of the synapse and spiketiming dependent-plasticity (STDP) protocols aim at mimicking such a Hebbian mode by pairing presynaptic stimulations with postsynaptic back-propagating action potentials [31,32]. Most of the striatal STDP studies report bidirectional (LTP and LTD) plasticity [33-39,40 ,41-44] (Figure 3). Note that in rate-coded protocols, the removal of external magnesium or postsynaptic depolarization for inducing LTP aims at mimicking a Strategies used for evaluating striatal plasticity during learning. Illustration of the analytical methods and parameters used to assess synaptic and structural striatal plasticity during procedural learning. In vivo electrophysiological extracellular recordings, two-photon imaging, GRIN lens imaging and fiber photometry allow collecting data all along the learning phases of behavioral tasks (continuous recordings schematized by the horizontal pink arrow). At time points of interest during the behavioral tasks (discontinuous recordings schematized by the blue dots), ex vivo patch-clamp and two-photon imaging on acute brain slices can be used to test whether various forms of striatal plasticity were induced in vivo. Examples of in vivo and ex vivo recordings for striatal plasticity assessment during procedural learning can be found in [9,13,17,18,19 ,20,21 ,68] and [3 ,22 ,23,24,25 ,26], respectively.
It should be noted that the plasticities observed in vitro (brain slices) and in vivo relate to distinct phases. Studies using brain slices investigate plasticity up to one hour post-protocol, thus referring to the early plasticity, whereas AMPA/NMDAR ratio and occlusion/saturation analysis are performed 1-3 days after the learning task, corresponding to the long-lasting phase of plasticity. Therefore, conclusions drawn from in vitro and ex vivo / in vivo plasticity are not straightforward, and it remains to determine whether similar signaling pathways are engaged during early and late plasticity phases.
In Hebbian plasticity, the association of two factors controls the synaptic strength, that is two inputs (and/or activity patterns) on the presynaptic and postsynaptic elements, with the addition of a third factor modulating plasticity [48]. Here, recent studies concerning GABA and dopamine acting as third factors for striatal STDP help to clarify the plasticity debate.
The conflicting observations of Hebbian [34,35] or anti-Hebbian striatal STDP (in brain slices [33,36,39,41,45] or in vivo [37]) are explained by the use (or lack of use) of GABA antagonists [38,43]. The appearance of tonic GABAergic signaling during development gates STDP polarity, promoting anti-Hebbian STDP in the adult striatum [43]. It remains to investigate pathological effects of tonic GABAergic transmission in striatal plasticity and procedural learning.
Bridging the gap between striatal plasticity and learning Perrin and Venance 107 Dopamine is crucial for action selection and supervised learning [49]. Striatal Hebbian plasticity requires dopamine (brain slices [34,35,50] ; in vivo [37,42]). A dendritic spine enlargement and an increase of calcium occur when dopamine is released concomitantly or after (1 s) glutamate [51 ] allowing in vivo STDP [37,42]. There are conflicting results concerning the involvement of D 1 R versus D 2 R in STDP (post-pre LTD and pre-post LTP requires D 1 R-activation but not D 2 R-activation [35]; LTP in d-SPNs is D 1 R-mediated and LTD in i-SPNs is D 2 R-mediated [34]; reviewed in Refs. [6][7][8]). In the absence of dopamine (and with GABA antagonists), D1-SPNs show LTD instead of LTP with pre-post pairings, whereas D2-SPNs display LTP for both post-pre and prepost pairings [34]. Methodological differences such as the location of the stimulation electrode (leading to different dopamine release [52 ]) or the number and frequency of paired stimulations could account for differential activation of D 1 R and D 2 R and specific-regulation of the backpropagating action potential [5]. This is particularly illustrated by the fact that LTP induced by theta burst optogenetic stimulation is dependent on presynaptic NMDAR and BDNF [53] but not on dopamine, whereas LTP induced with electrical high-frequency stimulation is generally dopamine-dependent (reviewed in Refs. [4][5][6][7][8]).
In future studies it will be crucial to investigate in behaving animals the action of third factors [48] in Hebbian learning, like for the eligibility traces (see next chapter).

Solving the temporal credit-assignment problem with eligibility traces and striatal plasticity
The temporal credit-assignment problem questions the temporal link between the reward and the preceding action to allow reinforcement learning [49]. The existence of eligibility traces, originally brought by computational models [54][55][56], helps to solve the temporal creditassignment problem. Eligibility traces are synaptic tags induced by Hebbian learning and are transformed into synaptic plasticity by the retroactive effect of neuromodulators. Theoretically, eligibility traces allow to keep a synaptic trace from the learning sequence, but not to promote plasticity per se, unless the reward signal occurs before extinction of eligibility traces (Figure 4). Therefore, eligibility traces temporally link the learning sequence with the reward allowing the induction of reinforcement learning via striatal plasticity. Structural plasticity, used as a proxy for synaptic plasticity, occurs exclusively when dopamine release happens 0.3-2 s after an STDP paradigm [51 ] (Figure 4). D 1 R and dendritic PKA activation allow to bridge the action (glutamatergic inputs) and the subsequent reward (dopamine); PKA activation is short-lived because of the high phosphodiesterase-10 A activity in distal dendrites [51 ]. Dopamine exerts also retroactive effects of on existing plasticity since dopamine delivered 2 s after cell-conditioning protocol (and importantly not before or during protocol) converts LTD in LTP [42,52 ].
Therefore, the expression of eligibility traces and the delivering of a distal reward allow the expression of plasticity [51 ] or even the conversion of a form of plasticity into another [42,52 ].

Future directions
Among the striatal compartments, the striosome and the matrix remain the less documented in terms of 108 Neurobiology of learning and plasticity  Eligibility traces bridge the gap between learning sequence and subsequent reward to promote reinforcement learning. Illustration of reinforcement learning allowed by short-lived eligibility traces at corticostriatal synapses. Eligibility traces are triggered following Hebbian sequence, which per se does not induce plasticity (illustrated here by the flat grey line coding for the synaptic weight). These eligibility traces (constituted by PKA activity controlled by phosphodiesterase-10 A in d-SPNs, as reported by [45]) can be transformed before their extinction into plasticity, if the teaching signal (dopamine in striatum) is delivered during the maintenance phase (3) of the eligibility traces; No plasticity is induced if dopamine is released either before (1), during the build-up (2) or after (4) eligibility traces (vertical bars illustrate the neuronal firing).
physiological role in striatal function. Thanks to new markers [14], in vivo two-photon monitoring during task performance showed overlapping responses of neurons belonging to the striosome and the matrix, with differential firing activity for reward coding [57]. Because of striosome/matrix differential inputs and outputs [14], dopaminergic [15,16] and endocannabinoid [58] regulation, specific-plasticities with differential modulation are expected to occur.
Although differential polarity of STDP in GABAergic and cholinergic interneurons has been shown [59], the full picture of the interplay of plasticities at striatal circuits is just beginning to be understood. It remains to further investigate the underlying mechanisms of the plasticities occurring at the lateral connections [60,61] such as interneuron-interneuron, interneurons-SPNs and SPN-SPNs. For example, a study showing an LTD at inhibitory synapses between SPN-SPNs and fast-spiking interneuron-SPNs demonstrates that distinct endocannabinoid signaling pathways are engaged depending on membrane potentials (up versus down states) [62]. Additionally, with the discovery of long-range projecting corticostriatal GABAergic neurons modulating motor activity via differential action on d-SPNs and i-SPNs [63], it will be important to take into account the fact that cortical activation leads to the direct release not only of glutamate but also of GABA into the striatum. This changes our view on the striatal excitation-inhibition balance, and begs the question of whether plasticity, if any, occurs between these long-range GABAergic neurons and the d-SPNs and i-SPNs.
Determining the conditions of emergence of plasticity helps to better understand the striatal capability for storage and recall of information. Noisy STDP pairings shows that plasticity robustness depends on the signaling pathways: NMDAR-LTP is more fragile than endocannabinoid-plasticity [44]. Interestingly, resistance of NMDAR-tLTP to noisy patterns is increased with higher frequency or number of pairings. In vivo Hebbian plasticity appears as a multivariate function of the number and frequency of pairings, but also the variability of the spike timing. In-vivo-like conditions for striatal plasticity, using naturalistic firing patterns of cortical/thalamic/striatal neurons recorded in learning tasks still need to be explored. Although STDP aims at mimicking Hebbian learning, reservations were expressed about its physiological validity [64]. Input-timing-dependent plasticity constitutes a Hebbian upgrade of STDP. It consists in paired activation of presynaptic inputs (distinct cortical areas and/or thalamic nuclei, for example), leading to subthreshold or suprathreshold activity in the postsynaptic neuron, as performed recently in avian basal ganglia [65].
Calcium imaging of i-SPNs and d-SPNs recorded ex vivo just after different phases of an operant lever-pressing task revealed that i-SPNs fired before d-SPNs in the goaldirected phase, whereas the reverse picture is observed during the habitual phase [13]. GABAergic fast-spiking interneurons become more excitable in habitual behavior [66,67] and could account for the reverse temporal order of firing between d-SPNs and i-SPNs. It remains to examine plasticity in DLS across goal-directed to habitual behavior at d-SPNs and i-SPNs, and also in GABAergic interneurons. Also, most of the studies focused on the involvement of NMDAR-LTP in learning. Based on the diversity of plasticities revealed by studies in brain slices (Figure 3), one needs to evaluate the role of the endocannabinoid-LTD and -LTP [39,40 ,47] across learning. Supporting this view, a recent study showed that endocannabinoids set the transition between goal-directed and habit formation via the control of cortico(orbital frontal cortex)-striatal synaptic weight [68]; the nature of the endocannabinoid plasticity at play at these synapses allowing the shift between goal-directed behavior to habits remains to be determined.
Attempts are made to link the complexity of striatal STDP and goal-directed behavior by elaborating computational models (for recent examples see [42,69]). In future studies, it will be necessary to upgrade the models with recent experimental findings, such as, to name a few, the lateral connections [60,61], the new faces of striatal STDP [39,40 ,44,47], the key role of striatal GABAergic interneurons in procedural learning [66,67] as well as the eligibility traces features [42,51 ,52 ].
A way to approach in vivo striatal plasticity during learning is to analyze the cortico-striatal synchronous oscillations. However, because of the absence of a laminar organization of the striatum, these oscillations can be contaminated by volume-conducted signals [70] leading to inaccurate interpretation. To overcome this, an elegant strategy consists in the specific-expression of channelrhodopsin in corticostriatal pyramidal cells and thus the unique possibility to estimate striatal opto-LFP changes in vivo during skill learning [21 ]. Another strategy is the use of fiber photometry to monitor upstream activity in cortical inputs arising from distinct cortices [71] (Figure 2). In vivo patch-clamp recordings in awake and behaving (head-fixed) rodents allows a single-cell resolution and the data collection of subthreshold and suprathreshold events [72]. This approach has been used for the analysis of the membrane potential dynamics during goal-directed behavior in d-SPNs and i-SPNs [73].
The field of striatal plasticity has come to a new age in which the investigation of intrinsic, synaptic and structural plasticity at play across procedural learning (from goal-directed behavior to habits) and across the striatal anatomo-functional complexity has become possible in behaving rodents. A new period of (constructive) debates is expected since various forms of plasticity should arise depending not only on the striatal complexity but also on the behavioral task and the related learning phase (early versus late).

Conflict of interest statement
Nothing declared.

19.
Koralek AC, Jin X, Long JD, Costa RM, Carmena JM: Corticostriatal plasticity is necessary for learning intentional neuroprosthetic skills. Nature 2012, 483:331-335. Building-up on their seminal publication (Yinet al., 2009), Costa's lab demonstrate the involvement of dorsal striatum and of striatal synaptic plasticity in an abstract skill learning task using neuroprosthetic action in behaving rodents, that is volitional modulation of M1 neural activity using auditory feedback. Using this new goal-directed behavioral task, they identified an increased coherence of the spiking activity in the theta band between the motor cortex and the striatum during the late phase of learning, accompanied by an ex vivo modification of the NMDAR/AMPA ratio.

21.
Xiong Q, Znamenskiy P, Zador AM: Selective corticostriatal plasticity during acquisition of an auditory discrimination task. Nature 2015, 521:348-351. The authors show the specific-engagement of striatal synaptic plasticity in striatal neurons coding for low or high frequency in the acquisition of an auditory frequency discrimination task. They set-up an elegant technique to assess synaptic efficacy changes in behaving animals by estimating opto-LFP (with the expression of ChR2 in auditory cortex). This study paved the way for future works to assess synaptic plasticity in behaving animals during a learning task.