Learning prediction error neurons in a canonical interneuron circuit

Sensory systems constantly compare external sensory information with internally generated predictions. While neural hallmarks of prediction errors have been found throughout the brain, the circuit-level mechanisms that underlie their computation are still largely unknown. Here, we show that a well-orchestrated interplay of three interneuron types shapes the development and refinement of negative prediction-error neurons in a computational model of mouse primary visual cortex. By balancing excitation and inhibition in multiple pathways, experience-dependent inhibitory plasticity can generate different variants of prediction-error circuits, which can be distinguished by simulated optogenetic experiments. The experience-dependence of the model circuit is consistent with that of negative prediction-error circuits in layer 2/3 of mouse primary visual cortex. Our model makes a range of testable predictions that may shed light on the circuitry underlying the neural computation of prediction errors.


Introduction
Changes in sensory inputs can arise from changes in our environment, but also from our own movements. When you walk through a room full of people, your perspective changes over time, and you will experience a global visual ow.
Superimposed on this global change are local changes generated by the movements of the people around you. An essential task of sensory perception is to disentangle these dierent origins of sensory inputs, because the appropriate behavioral responses to environmental and to self-generated changes are often dierent. Am I approaching a person or is she approaching me?
A common assumption is that perceptual systems subtract from the sensory data an internal prediction 16 , which is calculated from an eerence copy of the motor signals our brain has issued. Changes in the external world then take the form of mismatches or prediction errors between internal predictions and sensory data ? . This comparison requires an accurate prediction system that adapts to ongoing changes in the environment or in behavior. An ecient way to ensure a exible adaptation is to render the prediction circuits experience-dependent by minimizing prediction errors ? .
Neural hallmarks of prediction errors are found throughout the brain. Dopaminergic neurons in the basal ganglia and the striatum 7 encode a reward prediction error (mismatch between expected and received reward), and subsets of neurons in visual cortex 8,9 , auditory cortex 10,11 and barrel cortex 12 code for a mismatch between feedback and feedforward information.
While neural correlates of prediction errors have been found broadly, the circuit level mechanisms that underlie their computation are poorly understood. Given that prediction errors involve a subtraction of expectations from sensory data, the relevant circuits likely involve both excitatory and inhibitory pathways 9 . Negative predictionerror (nPE) neurons, which are activated only when sensory signals are weaker than predicted, are likely to receive excitatory predictions counterbalanced by inhibitory sensory signals. Conversely, positive prediction-error (pPE) neurons, which respond only when sensory signals exceed the internal prediction, could receive excitatory sensory signals counterbalanced by inhibitory predictions. How the complex inhibitory circuits of the cortex 1316 support the computations of these prediction errors is not resolved and neither are the activity-dependent forms of plasticity that would allow these circuits to rene the prediction machine.
For prediction-error neurons, fully predicted sensory signals should cancel with the internal prediction and hence trigger no response. We therefore hypothesized that an experience-dependent formation and renement of predictionerror circuits can be achieved by balancing excitation and inhibition in an activity-dependent way. Using a computational model comprised of excitatory pyramidal cells and three types of inhibitory interneurons, we show that nPE neurons can be learned by inhibitory synaptic plasticity rules that balance excitation and inhibition in principal cells. We nd that the circuit shows a similar experience dependence as observed in V1 9 . Depending on which interneuron classes receive motor predictions and which receive sensory signals, the plasticity rules shape dierent, fully functional variants of the prediction circuit. Using simulated optogenetic experiments, we show that these variants have identiable ngerprints in their reaction to optogenetic activation or inactivation of dierent interneuron classes. Finally, we demonstrate that the inhibitory prediction circuits can be learned by biologically plausible forms of homeostatic inhibitory synaptic plasticity, which only rely on local information available at the synapses.

Results
We studied a rate-based network model of layer 2/3 of rodent V1 to investigate how negative prediction-error (nPE) neurons develop. The model includes excitatory pyramidal cells (PCs) as well as inhibitory parvalbumin-expressing (PV), somatostatin-expressing (SOM) and vasoactive intestinal peptide-expressing (VIP) interneurons (Fig. 1 a). All neurons in the model receive excitatory background input that ensures reasonable baseline activities in the absence of visual input and motor-related internal predictions ("baseline"). A subset of inhibitory synapses chosen based on a mathematical analysis are subject to experience-dependent plasticity, which homeostatically controls the ring rate of PCs by balancing excitation and inhibition 17 (see Methods and Fig. 1 a). We stimulated the network with timevarying external inputs that represent visual stimuli and motor-related internal predictions ( Fig. 1 a,b). We reasoned shown for the sake of clarity. Somatic compartment of PCs, SOM and PV neurons receive visual input, apical dendrites of PCs and VIP neurons receive a motor-related prediction thereof. Connections marked with an asterisk undergo experience-dependent plasticity. (b) During plasticity, the network is exposed to a sequence of feedback (coupled sensorimotor experience) and playback phases (black square, visual input not predicted by motor commands). Stimuli last for 1 second and are alternated with baseline phases (absence of visual input and motor predictions). (c) Left: Before plasticity, somatic excitation (light red) and inhibition (light blue) in PCs are not balanced. Excitatory and inhibitory currents shifted by ± 20 pA for visualization. The varying net excitatory current (gray) causes the PC population rate to deviate from baseline. Right: Response relative to baseline (∆R/R) of all PCs in feedback (FB), mismatch (MM) and playback (PB) phase, sorted by amplitude of mismatch response. None of the PCs are classied as nPE neurons (indicated by gray shading to the right). (d) Same as in (c) after plasticity. Somatic excitation and inhibition are balanced. PC population rate remains at baseline. All PCs classied as nPE neurons (also indicated by black shading to the right). that during natural conditions, movements lead to sensory inputs that are fully predicted by internal motor commands ("feedback phase" 9 ), while unexpected external changes in the environment should generate unpredicted sensory signals ("playback phase" 9 ). Situations in which internal motor commands are not accompanied by corresponding sensory signals should be rare ("feedback mismatch phase" 9 ). During plasticity, we therefore stimulated the circuit with a sequence consisting of feedback and playback phases ("quasi-natural training", Fig. 1 b).

Negative prediction-error neurons emerge by balancing excitation and inhibition
Before the onset of plasticity, synaptic connections were randomly initialized, so PCs receive unbalanced excitation and inhibition. Therefore, all PCs change their ring rate in response to both feedback and playback stimuli, indicating the absence of nPE neurons (Fig. 1 c). During quasi-natural sensorimotor experience, inhibitory plasticity strengthens or weakens inhibitory synapses to diminish the ring rate deviations of PCs from their baseline ring rate ( Supplementary   Fig. S1). At the same time, dendritic inhibition mediated by SOM interneurons was suciently strengthened to suppress the motor prediction arriving at the apical dendrite. After synaptic plasticity, somatic excitation and inhibition are balanced on a stimulus-by-stimulus basis (Fig. 1 d). PCs merely show small and transient onset/oset responses to feedback and playback stimuli. In contrast, all PCs show an increase in activity for feedback mismatch stimuli ( Fig. 1 d). Hence, inhibitory synaptic plasticity generates nPE neurons by balancing excitation and inhibition in PCs for quasi-natural conditions.

Balance of excitation, inhibition and disinhibition in dierent functional prediction circuits
It is not fully resolved which interneuron types receive sensory inputs, motor signals or both. The circuit we studied so far was motivated by the widely accepted view that PCs and SOM and PV interneurons show visual responses 9,1823 , while long-range (motor) predictions arrive in the supercial layers of V1 and target VIP neurons 9,14,22,24 and the apical and distal compartments of PCs 9,21 . Because this view is not uncontested 24  We found that inhibitory plasticity establishes nPE neurons independent of the input conguration onto PCs and PV neurons (Fig. 2 b-e, right). The emerging connectivity of the interneuron circuits varied, however. For PCs not to respond above baseline in feedback and playback phase, various excitatory, inhibitory, disinhibitory and disdisinhibitory pathways need to be balanced. An informative example is the input conguation in which PCs receive visual input and PV neurons receive motor predictions (Fig. 2 c). In this case, visual inputs arrive at the PCs as direct excitation, as disinhibition through the SOM-PV pathway, and as dis-disinhibition via the SOM-VIP-PV pathway (Fig. 2 a). To keep the PCs at their baseline during the playback phase, these three pathways need to be balanced ( Fig. 2 c, left). Similarly, motor signals arrive at the PCs as inhibition from PV neurons, dis-inhibition via the VIP-PV pathway, dis-dis-inhibition via the VIP-SOM-PV pathway and as direct excitation to the dendrite that is canceled by SOM-mediated inhibition. Again, all these pathways need to be balanced to keep the PCs at their baseline for fully predicted visual stimuli (Fig. 2 c, left). Analog balancing arguments hold for other input congurations (( Fig.   2 b-e, left).
While the ow of visual and motor information in the learned inhibitory microcircuit is dierent for dierent input congurations, the neural responses of the dierent interneuron classes provide limited information about the input conguration. PV neuron activity reects whether PCs receive visual input: If PCs receive visual input, PV responses increase during feedback and playback phases to balance the sensory input at the soma of PCs (Fig. 2 b-c, right).
If PCs receive no visual input, PV neurons remain at their baseline ring rate ( Fig. 2 d-e, right). The activity of SOM and VIP neurons varies between playback, feedback and mismatch phases, but is independent of the input conguration for PCs and PV interneurons ( Fig. 2 b-e, right).
In summary, inhibitory plasticity can establish functional nPE circuits irrespective of the inputs onto the soma of PCs and PV neurons. Although the underlying circuits vary substantially in the specic balance of pathways, the neural activity patterns only weakly reect the underlying information ow.

Simulated optogenetic manipulations disambiguate prediction circuits
We hypothesized that the need to simultaneously balance several pathways oers a way to disambiguate the dierent prediction circuits by optogenetic manipulations. To test this, we systematically suppressed or activated PV, SOM and VIP interneurons in each input conguration after inhibitory plasticity had established the respective nPE circuit.
We found that in our model, such simulated optogenetic experiments are highly informative about the underlying input conguration (Fig. 3

Fraction of nPE neurons is modulated by inputs to SOM and VIP interneurons
In the model considered so far, all PCs developed into nPE neurons during learning, irrespective of the inputs to PCs and PV interneurons. However, nPE neurons represent only a small fraction of neurons in mouse V1 8,9 . Given that in our model, motor predictions arriving at the apical dendrites are canceled by SOM neuron-mediated inhibition, we hypothesized that the fraction of PCs that develop into nPE neurons depends on the distribution of visual and motor In summary, the fraction of nPE neurons that develop during learning depends on the distribution of visual input and motor predictions onto both SOM and VIP neurons.

Experience-dependence of mismatch and interneuron responses
Attinger et al. 9 showed that the number of nPE neurons and the strength of their mismatch responses decrease when mice are trained in articial conditions, in which motor predictions and visual ow were uncorrelated ("non-coupled training").
To test whether the model shows the same experience-dependence, we generated a modied training phase, in which visual inputs and motor-related predictions were statistically independent (Fig. 5 a). We found that the number of nPE neurons and their mismatch responses also decrease for non-coupled trained relative to quasi-natural trained networks (Fig. 5 b). This decrease is primarily due to changes in PCs and PV neurons, while the responses of SOM and VIP neurons during the mismatch phase are largely independent of the training paradigm ( Fig. 5 c). Hence, the experience-dependence of the model circuit is in line with that of nPE neurons in rodent V1 9 . nPE circuits can also be learned by biologically plausible learning rules In our model, nPE neurons developed though inhibitory plasticity that establishes an excitation-inhibition (E/I) balance in PCs. So far, we used learning rules that approximate a backpropagation of error 25 , which changed SOM→PV and VIP→PV connections such as to minimize the dierence between the PC ring rate and a baseline rate. The biological plausibility of such backpropagation rules, which are broadly used in articial intelligence, is still debated, because they rely on information that is not locally available at the synapse in question 26,27 . We therefore wondered whether prediction-error circuits can also be established by biologically plausible local learning rules.
We found that nPE neurons also emerged when the backpropagation rules were replaced by a form of plasticity that changes SOM→PV and VIP→PV synapses in proportion to the dierence between the excitatory recurrent drive onto PV neurons and a target value (Fig. 6 a). This local form of learning also balanced excitation and inhibition ( Fig. 6 b,c) and all PCs develop into nPE neurons (Fig. 6 c).
The plasticity rules can be further simplied when PCs do not receive visual information. In this case, the strength of SOM→PV and VIP→PV synapses can be learned according to a homeostatic rule 17 that aims to sustain a target rate in the PV neurons ( Supplementary Fig. S3).
In summary, the backpropagation-like learning rules for the synapses onto PV neurons can be approximated by biologically plausible rules that exploit local information available at the respective synapses. Discussion How the nervous system disentangles self-generated and external sensory stimuli is a long-standing question 1,2,6 . Here, we investigated the circuit level mechanisms that underlie the computation of prediction errors and how dierent types of inhibitory neurons shape these prediction circuits. We used computational modelling to show that nPE neurons can be learned by balancing excitation and inhibition in cortical microcircuits with three types of interneurons. We show that the required E/I balance can be achieved by biologically plausible forms of synaptic plasticity. Furthermore, the experience-dependence of the circuit is similar to that of nPE circuits in mouse V1 9 .
Our model makes a number of predictions. Firstly, the multi-pathway balance of excitation and inhibition suggests that the input conguration of the prediction circuit could be disambiguated using cell type-specic modulations of neural activity. This could be achieved by optogenetic or pharmacogenetic manipulations, or by exploiting the dierential sensitivity of interneuron classes to neuromodulators. The precarious nature of an exact multi-pathway balance also suggests that nPE neurons might change their response characteristics in a context-dependent way, e.g., by neuromodulatory eects.
Secondly, the central assumption of the model is that nPE neurons emerge by a self-organized E/I balance during sensorimotor experience. It therefore predicts that (i) sensorimotor experience that the animal is habituated to should lead to balanced excitation and inhibition in PCs, (ii) E/I balance should break for sensorimotor experience the animal has rarely encountered, e.g., for mismatches of sensory stimuli and motor predictions and (iii) during altered sensorimotor experience in a virtual reality setting or when the excitability of specic interneuron types is altered, interneuron circuits should gradually recongure to reestablish the E/I balance.
During learning, we exposed the network to sensory inputs and motor-related predictions designed to reect coupled sensorimotor experience. To allow for changes in the external world that do not arise from the animal's own movements, we included "playback" phases in which the visual input is stronger than predicted by the motor-related input.
Consistent with the experimental setup of Attinger et al. 9 , we deliberately excluded feedback mismatch phases. In the model, the stimuli experienced during learning have a strong impact on the response structure of the PCs, because the learning rules aim to keep the PCs at a given baseline rate at all times. The inclusion of feedback and playback phases during learning therefore leads to neurons that remain at their baseline during those phases, in line with nPE neurons.
In mouse V1, nPE neurons exhibit an average rate decrease during playback when the animals were only exposed to perfectly coupled sensorimotor experience 9 . When our network was trained in the same way, we also observed that PCs reduced their ring rate during playback phases ( Supplementary Fig. S4). This can be a result of an excess of somatic inhibition, dendritic inhibition or both. The model hence predicts that the rate reduction during playback phases observed by Attinger et al. 9 vanishes when playback phases are included during training.
The interneuron circuit in our model is motivated by the canonical circuit found in a variety of brain regions 15,16,28 .
In addition to the connections between interneuron classes that are frequently reported as strong and numerous, we included VIP→PV synapses in the circuit, because a mathematical analysis reveals that they are required for a perfect E/I balance during both feedback and playback phases (see Supplementary Notes). While VIP→PV synapses have been found in visual 15 , auditory 29 , somatosensory 28,30 and medial prefrontal cortex 29 , as well as amygdala 31 , they are less prominent and often weaker than SOM→PV connections (but see Krabbe et al. 31 ). VIP→PV synapses can be excluded when the conditions for nPE neurons during feedback and playback phases are mildly relaxed 8,9,11 and when PV neurons receive visual, but not motor inputs ( Supplementary Fig. S5).
We used a mathematical analysis to identify a number of synapses in the circuit that undergo experience-dependent changes. While the synapses from PV neurons onto PCs established a baseline ring rate in the absence of visual input and motor predictions, the synergy between the SOM→PV, VIP→PV and SOM→PC synapses guaranteed that the baseline is retained in feedback and playback phase. Our mathematical analysis unveiled constraints for the interneuron motif, that is, the relation between the strengths of a number of inhibitory synapses (see Methods, Eqs. 8,9). The multi-pathway balance of excitation and inhibition could also be achieved by synaptic plasticity in other inhibitory synapses for example the mutual inhibition between SOM and VIP neurons. However, the assumption that mainly the inhibitory synapses onto PV neurons are plastic is supported by the observation that PV neuron activity in contrast to SOM and VIP neuron activity is experience-dependent 9 .
In the model, the plastic inhibitory synapses onto PV neurons change according to non-local information that might not be directly available at the synapse. These synapses therefore implement an approximation of a backpropagation of error, the biological plausibility of which is debated 26 . We showed that this plasticity rule can be approximated by biologically plausible variants of the plasticity rules. If PCs do not receive direct visual input ( Supplementary Fig. S3), the backpropagation-like algorithm can be replaced by a simple homeostatic Hebbian plasticity rule in the synapses onto the PV interneurons. Given that PCs in V1 are known to receive substantial visual drive 19,20 , this assumption is unlikely to be valid. We therefore propose an alternative form of plasticity that changes SOM→PV and VIP→PV synapses in proportion to the dierence between the excitatory recurrent drive onto PV neurons and a target value (Fig. 6). The underlying mechanism is similar to feedback alignment 32 and requires sucient overlap between the set of postsynaptic PCs a PV neuron inhibits and the set of presynaptic PCs the same PV neuron receives excitation from. This is likely, given the high connection probability between PCs and PV neurons 15,16,33 .
We modelled the apical dendrite of PCs as a single compartment that integrates excitatory and inhibitory input currents and has the potential to produce calcium spike-like events 3437 . Moreover, we assumed that an overshoot of inhibition decouples the apical tuft of the PCs from their soma, by including a rectifying non-linearity that precludes an excess of dendritic inhibition to inuence somatic activity. However, the presence or nature of these dendritic nonlinearities has a minor inuence on the development of nPE neurons ( Supplementary Fig. S6). When we allowed dendritic inhibition to inuence the soma, inhibitory plasticity still established nPE neurons, although the learned interneuron circuit diers with respect to the synaptic strengths. The additional dendritic inhibition reduces the required amount of somatic, PV-mediated inhibition. This is primarily the case during playback phases, when the excitatory motor input to the apical dendrite is absent. PV neurons are therefore less active during the playback phase than during the feedback phase ( Supplementary Fig. S6), consistent with recordings in mouse V1 9 .
By modelling the apical dendrite as a single compartment, we also neglected the possibility that dendritic branches process distinct information. However, we expect that the suggested framework of generating predictive signals by a compartment-specic E/I balance generalizes to more complex dendritic congurations, in which local inhibition could contribute by gating dierent dendritic inputs 38 .
Cortical circuits are complex and contain a large variety of interneuron classes 13,14,16 . We restricted the model to three of these classes: PV, SOM and VIP neurons. It is conceivable that several other interneuron types can play a pivotal role in prediction-error circuits. The dendrites of layer 2/3 neurons reach out to layer 1, the major target for feedback connections 21,39,40 and home to a number of distinct interneuron types 41,42 , which may contribute to associative learning 43,44 . In particular, NDNF neurons unspecically inhibit apical dendrites located in the supercial layers, and at the same time receive strong inhibition from SOM neurons 43 . Hence, it is possible that these interneurons also shape the processing of feedback information, including the computation of prediction errors.
PCs in L2/3 of V1 have very low spontaneous ring rates 20,45 . A potential rate decrease during feedback and playback could hence be hard to detect. Whether the low response of nPE neurons during feedback and playback phases are due to an E/I balance as suggested here or due to an excess of inhibition may hence be dicult to decide, and could for example be resolved by intracellular recordings.
Our model suggests a well-orchestrated division of labor of PV, SOM and VIP interneurons that is shaped by experience: While PV neurons balance the sensory input at the somatic compartment of PCs, SOM neurons cancel feedback signals at the apical dendrites. VIP neurons ensure suciently large mismatch responses by amplifying small dierences between feedforward and feedback inputs 9,37 . Given the relative uniformity of cortex in its appearance, structure and cell types 46,47 , it is conceivable that the same principles also hold for other regions of the cortex beyond V1. Shedding light on the mechanisms that constitute the predictive power of neuronal circuits may in the long run contribute to an understanding of psychiatric disorders that have long been associated with a malfunction of the brain's prediction machinery 4850 and specic types of interneurons 5153 .

Network model
We simulated a rate-based network model of excitatory pyramidal cells (N PC = 70) and inhibitory PV, SOM and VIP neurons (N PV = N SOM = N VIP = 10). All neurons are randomly connected with connection strengths and probabilities given below (see "Connectivity").
The excitatory pyramidal cells are described by a two-compartment rate model that was introduced by Murayama et al. 36 . The dynamics of the ring rate r E,i of the somatic compartment of neuron i obeys where τ E denotes the excitatory rate time constant (τ E =60 ms), Θ terms the rheobase of the neuron (Θ = 14 s −1 ).
Firing rates are rectied to ensure positivity. I i is the total somatic input generated by somatic and dendritic synaptic events and potential dendritic calcium spikes: Here, the function [x] + = max(x, 0) is a rectifying nonlinearity that prohibits an excess of inhibition at the apical dendrite to reach the soma. I syn D,i and I syn E,i are the total synaptic inputs into dendrite and soma, respectively, and c i denotes a dendritic calcium event. λ D and λ E are the fraction of "currents" leaking away from dendrites and soma, respectively (λ D =0.27, λ E =0.31). The synaptic input to the soma I syn E,i is given by the sum of external sensory inputs x E and PV neuron-induced (P) inhibition, The dendritic input I syn D,i is the sum of motor-related predictions x D , the recurrent, excitatory connections from other PCs and SOM neuron-induced (S) inhibition: The weight matrices w EP , w DS and w DE denote the strength of connection between PV neurons and the soma of PCs (w EP ), SOM neurons and the dendrites of PCs (w DS ) and the recurrence between PCs (w DE ), respectively. The input generated by a calcium spike is given by where c scales the amount of current produced (c = 7 s −1 ), H is the Heaviside step function, Θ c represents a threshold that describes the minimal input needed to produce a Ca 2+ -spike (Θ c = 28 s −1 ) and I 0 D,i denotes the total, synaptically generated input in the dendrites, Note that we incorporated the gain factor present in Murayama et al. 36 into the parameters to achieve unit consistency for all neuron types.
The ring rate dynamics of each interneuron is modeled by a rectied, linear dierential equation 54 , where r X,i denotes the ring rate of neuron i from neuron type X (X ∈ {P, S, V }) and x i represents external inputs.
The weight matrices w XY denote the strength of connection between the presynaptic neuron population Y and the postsynaptic neuron population X. The rate time constant τ i was chosen to resemble a fast GABA A time constant, and set to 2 ms for all interneuron types included.

Negative prediction-error neurons
We dene PCs as nPE neurons when they exclusively increase their ring rate during feedback mismatch (visual input smaller than predicted), while remaining at their baseline during feedback and playback phases. In a linearized, homogeneous network and under the assumption that the apical dendrites are suciently inhibited during feedback and playback phase, this denition is equivalent to two constraints on the interneuron network (see Supporting Information for a detailed analysis and derivation): The parameters V X , M X ∈ {0, 1} indicate whether neuron type X receives visual and motor-related inputs, respectively, and control the dierent input congurations. In addition to the conditions Eqs. 8 and 9, the synapses from SOM neurons onto the apical dendrites must be suciently strong to cancel potential excitatory inputs during feedback and playback phase.
In practice, we classify PCs as nPE neurons when ∆R/R is larger than 20% in the mismatch phase and less than ±10% elsewhere (∆R/R = (r − r BL )/r BL , r BL : baseline ring rate). Tolerating small deviations in feedback and playback phase is more in line with experimental approaches. The results do not rely on the precise thresholds used for the classication.

Connectivity
All neurons are randomly connected with connection probabilities motivated by the experimental literature 15,16,28,29,33,5557 , All cells of the same neuron type have the same number of incoming connections. The mean connection strengths are given by where the symbol * denotes weights that vary between simulations (e.g., subject to plasticity or computed from the equations (8) and (9)). For non-plastic networks, these synaptic strengths are given by w EP = 2.8, w DS = 3.5, comply with Dale's principle.
All weights are scaled in proportion to the number of existing connections (i.e., the product of the number of presynaptic neurons and the connection probability), so that the results are independent of the population size.

Inputs
All neurons receive constant, external background input that ensures reasonable baseline ring rates in the absence of visual and motor-related input. In the case of non-plastic networks, these inputs were set such that the baseline ring rates are r E = 1s −1 , r P = 2s −1 , r S = 2s −1 and r V = 4s −1 . In the case of plastic networks, we set the external inputs to x E = 28s −1 , x D = 0s −1 , x P = 2s −1 , x S = 2s −1 and x V = 2s −1 (if not stated otherwise). In addition to the external background inputs, the neurons receive either visual input (v), a motor-related prediction thereof (m) or both.
In line with the experimental setup of Attinger et al. 9 , we distinguish between baseline (m = v = 0), feedback (m = v > 0), feedback mismatch (m > v) and playback (m < v) phases. During training, the network is exposed to feedback and playback phases with stimuli drawn from a uniform distribution from the interval [0, 7s −1 ]. After learning, the strength of stimuli is set to 7s −1 (plastic networks) or 3.5s −1 (non-plastic networks).

Plasticity
In plastic networks, a number of connections between neurons are subject to experience-dependent changes in order to establish an E/I balance for PCs. PV→PC and the PC→PV synapses establish the target ring rates for PCs and PV neurons, respectively. VIP→PV and SOM→PV synapses and the synapses from SOM neurons onto the apical dendrites of PCs ensure that PCs remain at their baseline during feedback and playback phase. The corresponding plasticity rules are of the form In detail, the connections from PV and SOM neurons onto the soma and the apical dendrites, respectively, obey inhibitory Hebbian plasticity rules akin to Vogels et al. 17 The parameter ρ post E,0 denotes the baseline ring rate of the postsynaptic PC, and the dendritic activity A post i is given by the rectied synaptic events at the dendrites The small "correction" term eases the eect of strong onset responses (here, we used = 0.1s −1 ).
The connections from both SOM and VIP neurons onto PV neurons implement an approximation of a backpropagation of error When the connection probability between PCs and PV neurons is large, this backpropagation of error can be replaced by a biologically plausible learning rule that only relies on local information available in the PV neurons.
where ∆E rec,i denotes the dierence between the excitatory recurrent drive onto PV neuron i and a target value S pre i denotes the set of presynaptic PCs a particular PV neuron receives excitation from.
When nPE neurons do not receive direct visual input, the backpropagation rules can be simplied even further.
The synapses onto PV neurons can be learned according to a Hebbian inhibitory plasticity rule 17 that aims to sustain a baseline rate in the PV neurons with X ∈ {S, V }. This baseline rate is established by modifying the connections from PCs onto PV neurons according to an anti-Hebbian plasticity rule ∆w PE,ij ∝ (ρ post P,0 − r post P,i ) · r pre E,j .
Simulation All simulations were performed in customized Python code written by LH. Dierential equations were numerically integrated using a 2 nd -order Runge-Kutta method with time steps between 0.05 and 2 ms. Neurons were initialized with r i (0) = 0. Source code will be made publicly available upon publication.

Constraints for the interneuron circuit
To derive the constraints for the interneuron network that are imposed by the presence of nPE neurons, we performed a mathematical analysis of a simplied network model, in which the nonlinearity of the dendritic compartment and the rectifying nonlinearities are neglected. This reduces the network to an analytically tractable linear system. The simplications rely on the following assumptions: 1. During baseline, feedback and playback phases, SOM interneuron-mediated inhibition exceeds excitatory motor predictions arriving at the apical dendrites of PCs.
2. Any excess of inhibition in the dendrite does not aect the the soma of PCs.
3. During baseline, feedback and playback phases, all neuron types have positive ring rates, such that the rate rectication can be neglected.
These assumptions allow us to omit the dendritic compartment of PCs and consequently all synapses thereto. The remaining system of linear equations describes the activity of all neuron types during baseline, feedback and playback phase. For the subsequent analysis, we furthermore consider a homogeneous network, that is, all weights, neuronal properties and the number of incoming connections for cells of the same type are the same. As a result, we can reduce the high-dimensional system to 4 equations, each describing the dynamics of one representative ring rate per neuron type: where τ denotes the rate time constant, r = [r E , r P , r S , r V ] T (subscripts refer to the dierent neuron types; E: soma of PC, P: PV, S: SOM, V: VIP), Ω is the weight matrix and X denotes the external inputs. In the steady state, the ring rates are given by with the eective connectivity matrix W that includes the leak: The weight parameters w XY between neuron types are strictly positive to maintain the excitatory/inhibitory nature of the various neuron types.
In our model, an excitatory neuron is classied as a perfect nPE neuron, if During feedback mismatch, the PC ring rate increases with respect to the baseline as long as the motor-related excitatory inputs exceed the somatic inhibition mediated by PV neurons. The conditions according to which no change in activity occurs in either feedback or playback phase (see Eq. 24) impose constraints on the weight conguration that need to be satised. These can be summarized by where X f b and X pb denote the excess external inputs above baseline during feedback and playback phase, respectively, with s representing a varying excitatory stimulus strength. The parameters V X , M X ∈ {0, 1} indicate whether neuron type X receives visual and motor-related inputs, respectively, and control the dierent input congurations.
Canonical interneuron connectivity with VIP-to-PV synapses: We start with the connectivity motif proposed by Pfeer et al. 15 . We also allow for connections from VIP to PV neurons. Although they are considered to be less prominent and weaker than connections from VIP to SOM neurons and are therefore often neglected in diagrams and computational models, those synapses have been observed in various brain regions 15,2831 . To this end, the respective connectivity matrix is given by The constraints (26) and (27) dening nPE neurons are then given by These two equations yield Eq. 33 and 34 are the mathematical formulation of the E/I balance of multiple pathways shown in Fig. 2 and Supplementary Fig. S2.
For the derivation above, we have assumed that the motor-related input is switched o during the playback phase.
This assumption, however, can be relaxed. When motor predictions are merely smaller than the actual sensory input but non-zero during playback, analogous calculations yield the same constraints.
Canonical interneuron connectivity without VIP-to-PV synapses: Without connections from VIP onto PV neurons, the constraints (26) and (27) yield These two equations simplify to As the weight w P S is strictly positive (see denition of weight matrix above), the product w SV w VS must be larger than 1. This, however, indicates that networks with rate rectication exceed a bifurcation point and run into a winner-takeall (WTA) regime, in which either VIP or SOM neurons are silent 37 .
With VIP neurons being silent in all phases but during feedback mismatch phases, the constraint on w PS can be recalculated from Eqs. 22 and 24 while neglecting VIP neurons: This equation reveals that PV neurons must receive visual input to ensure w PS > 0.
In summary, this mathematical analysis shows that perfect nPE neurons can only emerge when VIP neurons are silent during all phases but the feedback mismatch phase.
Please note that the same results are obtained even if connections from PV to both SOM and VIP neurons are included. Before plasticity, somatic excitation (light red) and inhibition (light blue) at PCs are not balanced. Excitatory and inhibitory currents are shifted by ± 20 pA for visualization. The varying net excitatory current (gray) causes the PC population rate to deviate from baseline. Right: Response relative to baseline (∆R/R) of all PCs in feedback, mismatch and playback phase, sorted by amplitude of mismatch response. None of the PCs are classied as nPE neurons (indicated by gray shading to the right). c Same as in (b) after plasticity. Somatic excitation and inhibition are balanced. PC population rate remains at baseline. All PCs classied as nPE neurons (also indicated by black shading to the right). Figure S4. Coupled-trained networks can produce nPE neurons that decrease their activity in playback phase. (a) During plasticity, the network is exposed to a sequence of feedback phases only, representing perfectly coupled sensorimotor experience. Network model shown in Fig. 1. Connections from VIP to PV neurons are non-plastic.   Fig. 1. Model setup modied to enable the presence of nPE neurons while abiding to Dale's law: PCs receive 0.5 x visual input. External excitatory input onto the dendrites is set such that it balances inhibition mediated by SOM neurons in the baseline phase. Additional non-linearity for synapses from SOM neurons onto the apical dendrites of PCs: ∆w DS ∝ σ(A D ) · A D · r S , where A D denotes the total dendritic activity and σ is a sigmoid function. (a) Before plasticity, somatic excitation (light red) and inhibition (light blue) in PCs are not balanced. Excitatory and inhibitory currents are shifted by ± 20 pA for visualization. The varying net excitatory current (gray) causes the PC population rate to deviate from baseline. (b) Left: Response (∆R/R) of all PCs in feedback, mismatch and playback phase, sorted by amplitude of mismatch response. All PCs change their ring rate in response to all stimulation patterns. None of the PCs are classied as nPE neurons (indicated by gray shading to the right). Right: Population responses of PV, SOM and VIP neurons in all phases. Responses are normalized between -1 and 1 such that baseline is zero. (c) Same as in (a) after plasticity. Somatic excitation and inhibition are balanced. PC population rate remains at baseline. (d) Same as in (b) after plasticity. Almost all PCs classied as nPE neurons (indicated by black/gray shading to the right). PV neurons are less active during the playback phase than during the feedback phase.