Sparse Coding with a Somato-Dendritic Rule

Cortical neurons are silent most of the time. This sparse activity is energy efficient, and the resulting neural code has favourable properties for associative learning. Most neural models of sparse coding use some form of homeostasis to ensure that each neuron fires infrequently. But homeostatic plasticity acting on a fast timescale may not be biologically plausible, and could lead to catastrophic forgetting in embodied agents that learn continuously. We set out to explore whether inhibitory plasticity could play that role instead, regulating both the population sparseness and the average firing rates. We put the idea to the test in a hybrid network where rate-based dendritic compartments integrate the feedforward input, while spiking somas compete through recurrent inhibition. A somato-dendritic learning rule allows somatic inhibition to modulate nonlinear Hebbian learning in the dendrites. Trained on MNIST digits and natural images, the network discovers independent components that form a sparse encoding of the input and support linear decoding. These findings confirm that intrinsic plasticity is not strictly required for regulating sparseness: inhibitory plasticity can have the same effect, although that mechanism comes with its own stability-plasticity dilemma. Going beyond point neuron models, the network illustrates how a learning rule can make use of dendrites and compartmentalised inputs; it also suggests a functional interpretation for clustered somatic inhibition in cortical neurons. Author Summary Ever since the inception of neural networks in the 1950s, their engineering applications have relied on very simple artificial neurons that ignore many of the features of the cells in our brains. Research into the finer details, such as their extensive dendritic trees, was mostly the work of biologists. But the computational capabilities of dendrites are now attracting the attention of the machine learning community and there are attempts to make use of them in deep neural networks. Our work gives one example of the kind of computations that become possible once one steps beyond simple point neurons to include more biological details. Our topic is sparse coding, a field which studies how neural systems can discover structure in natural stimuli such as images. We show how adding dendrites to artificial neurons lets them solve the task in a different way. This may have benefits for creating robots that learn from experience, and suggests a number of electrophysiological experiments that could teach us more about how real neurons work.


Introduction
reflecting instead some fundamental structure in the input -a characteristic that reminds us of the suspicious coincidences of Barlow [10].

As for competitive learning, described by Rumelhart & Zipser [11]
, it aims to reduce the redundancy of the code and decorrelate the output dimensions, so that each neuron responds to a different feature. This usually involves a winner-takeall system [12], or inhibitory connections between the coding neurons [13,14] -an organisation which is equivalently called lateral, recurrent or mutual inhibition.
Starting with Földiák [15], these two heuristics have been applied in a variety of sparse coding networks with rate-based [16][17][18] and then spiking neurons [19][20][21][22]. These networks have in common the use of Hebbian lateral inhibition to decorrelate the output, and of nonlinear Hebbian rules to perform projection pursuit on the feedforward input. [24] is an early example of such a rule, inducing depression when the output activity is below average and potentiation when it is above average. This steers gradient descent towards an activity distribution with heavy tails, which typically converges onto one of the independent components. [23], the precise shape of that nonlinear function is not critical. The trick is to keep it aligned with the activity distribution throughout learning, so that the potentiation region stays centered on the tail. Usually, this is done by enforcing a constant norm for the weight vectors, or by using a homeostatic term that moves the potentiation threshold according to the average activity of the neuron, as in the BCM rule. That homeostatic term has the effect to regulate the lifetime sparseness of the neuron and is also called intrinsic plasticity (IP) by Triesch [25], to distinguish it from synaptic plasticity.

As noted by Brito & Gerstner
In most models, IP needs to be faster than the Hebbian component of learning [26,27]. But in vivo, IP tends to be slower, acting over a timescale of days rather than minutes [28,29]. Besides, fast homeostasis could be particularly disruptive for animals and robots that learn continuously, and cannot assume that the feature detectors they have acquired will be stimulated at regular intervals.
Here we propose an alternative scheme that does not require fast homeostatic plasticity. The idea is to put mutual inhibition itself in control of the Hebbian nonlinearity: stimuli for which many neurons compete to respond, and neurons that are often active as well, would attract more lateral inhibition and be subject to a higher potentiation threshold. In other words, instead of using intrinsic plasticity to enforce lifetime sparseness, this scheme would regulate both the population and the lifetime sparseness through synaptic plasticity.
To do so, we need a mechanism through which the feedforward learning rule could measure the amount of competition on an input-by-input basis and use it as a negative feedback. But artificial neural networks usually employ point neurons, where all inputs are added together into a single activity variable. The consequence is that the learning rule cannot distinguish between stronger lateral inhibition -the signal to become more selective -and weaker feedforward activity that results from synaptic plasticity or from fluctuations in the input.
The solution could be to integrate the feedforward and recurrent pathways in separate neural compartments, for instance the soma and a dendrite. The dendritic compartment could then estimate the amount of somatic inhibition by comparing its local depolarisation with the somatic activity that it perceives via backpropagating action potentials.
The idea has been tried before, although not on a sparse coding task. In Körding & König [30], lateral inhibition can prevent the backpropagating action potentials from reaching the dendrites, which induces depression in dendritic synapses via spike-timing dependent plasticity. Urbanczik & Senn [31] use probabilistic spiking neurons where the dendritic compartment tries to match the somatic potential; this results in depression when unpredicted external inputs inhibit the soma, and potentiation when these unpredicted inputs are excitatory instead.
Here we set out to investigate whether a variant of these somato-dendritic learning rules could discover sparse codes in natural stimuli. We found that one can adjust the somatic and dendritic transfer functions to produce a BCM-like curve where the threshold between depression and potentiation follows an instantaneous measure of somatic inhibition. This lets the network learn sparse codes without fast intrinsic plasticity.

Network model
Our model is a network of leaky integrate-and-fire (LIF) neurons, each with an extra dendritic compartment ( fig. 1). There are two fully-connected pathways: a recurrent inhibitory pathway between the somas, and a feedforward pathway between the input and the dendrites. The network is meant to model a small patch of neural tissue where full connectivity is an acceptable approximation; hence we keep the number of neurons small ( ≤ 1024). With respect to the dimensionality of our input stimuli, this translates to networks that range from undercomplete ( / ≪ 1) to slightly overcomplete ( / ≈ 1.3).

Figure 1:
Architecture of the network. Annotations indicate the feedforward input , leaky integrate-and-fire (LIF) somas and their firing rate , dendritic compartments and dendritic activity , and feedforward and recurrent pathways with weights and , respectively. The symbol • denotes an inhibitory synapse, ∘ an excitatory one.
The recurrent pathway mediates all-to-all inhibition via spikes and conductancebased somatic synapses. For simplicity we do not use separate inhibitory interneurons. Although that architecture deviates from biology and Dale's law, King et al. [21] found that replacing direct inhibition with interneurons did not substantially alter the results of Zylberberg et al. [20].
The feedforward pathway targets the dendrites and contains both excitatory and inhibitory synapses. It carries rates instead of spikes; doing so allows us to employ a classical Hebbian formalism in the learning rule and discrete-time dendritic compartments. A spike-based input and continuous-time dendrites would be more biologically plausible, but the model would also become substantially more complex; we reserve these for future work. Here we use a rectified linear activation function in the dendrites, with some modifications to account for the overal transfer properties of biological dendrites (see Methods for details).
The network operates as follows. We present each input pattern to the dendrites and compute the dendritic activation . This results in a constant current flow from the dendrite to the soma while the somas compete to respond for 100 timesteps ( = 0.5 ms). Then we compute firing rates using both the num-ber of spikes and the spike latencies. Finally, we apply the feedforward and recurrent learning rules. We repeat these steps for the next input pattern, etc.

Feedforward learning rules
The weight of each feedforward, dendritic synapse is updated according to a nonlinear Hebbian rule: where is the input rate, is the dendritic activation, is the somatic firing rate, is the learning rate, controls the scale of the weights and sets the potentiation/depression ratio. The rule can change the sign of the weights, switching between excitatory and inhibitory synapses. It is gated by post-synaptic activity: there is no change of weight when both and are zero. This ensures that the weights do not fade when the neuron is silent.
The term ( − ) at the core of that somato-dendritic rule is reminiscent of the Delta rule [32,33], and it can be seen as a rate-based variant of the rules used in Körding & König [30] and Urbanczik & Senn [31]. In Urbanczik & Senn, the purpose of learning is to correct the mismatch between the somatic activity, , and its prediction by the dendrite. Thus, they use an error-correcting term − ( ), where is the dendrite's own model of the somatic transfer function and ( ) ≈ when the dendritic prediction is correct.
In contrast, here the goal is not to achieve a perfect prediction of the somatic activity by the dendrite, but to exploit the mismatch between and so that it creates a BCM-like curve modulated by inhibition ( fig. 2). In the absence of somatic inhibition, we set so that − is non-negative and the rule behaves like a linear Hebbian rule. In the presence of somatic inhibition, − is zero for subthreshold inputs, negative for for excitatory inputs that fail to elicit enough somatic spikes, and positive for those that produce a strong response. This yields a nonlinear Hebbian rule where the effective threshold between potentiation and depression depends on the amount of competition received for each particular input, without averaging over the recent activity of the neuron.

Figure 2:
The learning rule produces a BCM-like curve controlled by somatic inhibition. Each curve plots the effective Hebbian nonlinearity − as a function of the net dendritic input (assuming = 0 so that we can ignore the effect of the weight decay term − ). Injecting a constant inhibitory current into the soma (marked on the curves) shifts the potentiation threshold to the right. The bumps in the curves are a consequence of the way we compute the firing rate and mark the occurence of an extra spike. Note: this figure was generated with a finer timestep dt = 0.01 ms to smoothe the discontinuities in the curves caused by the discrete spike times.
Crucially, the two LTD terms are gated by the dendritic activity , and will therefore not be suppressed by somatic inhibition that affects . This allows the learning rule to depress the synapses that are active when the soma is strongly inhibited, shifting the distribution of the net dendritic input back to the left and making the neuron more selective as a result.

In contrast, Földiák [15], Zylberberg et al. [20] and King et al. [21] use a single heterosynaptic LTD term of the form −
that is gated by somatic activity and cannot induce depression in response to lateral inhibition. Instead, these networks work the other way around: they first make the output selective through IP, and then transfer that selectivity to the receptive fields by pruning the synapses that are silent when the neuron is highly active.
When > 0 the rule is able to convert some of the recurrent inhibition into feedforward inhibition, producing receptive fields that have both ON and OFF fields even with a non-negative input. Note that homosynaptic LTP and LTD are swapped if we interpret negative weights as inhibitory synapses.
Finally, we apply a separate regularisation rule, taking care not to change the sign of the weight: where determines the amount of regularisation. This does not fundamentally change the operation of the learning rule, but simplifies the receptive fields by suppressing the weights of weakly correlated input dimensions.

Recurrent learning rule
The somatic synapses that mediate lateral inhibition are plastic as well. The weight of each recurrent, somatic synapse between a pre-and a post-synaptic neuron follows a standard Hebbian rule with pre-synaptic gating: where is the learning rate and controls the scale of the weights. Gating by pre ensures that the inhibition from a winning neuron to a losing neuron decays, but the reciprocal connection does not. The asymmetry prevents a single neuron from taking over all the input features [34]. In practice, we use a much faster learning rate for the recurrent inhibition compared to the feedforward synapses ( ≫ ); otherwise receptive fields are unstable and oscillate between selective and non-selective features.

Receptive fields
Our first experiment is to look at the receptive fields of the neurons after training on various types of inputs. The expectation, for a sparse coding network, is that these receptive fields should correspond to selective features (rather than complete patterns) and that the neurons should be silent most of the time.
Trained on the MNIST dataset of handwritten digits [35], the network learns receptive fields that respond to fragments of digits or pen strokes, as shown in fig. 3. These receptive fields resemble the ones learned by sparse auto-encoders [8], despite the fact that we use a different algorithm -a coincidence which can be explained if these pen-stroke shapes are indeed the independent components of MNIST digits. The activity of the network is sparse throughout the training period, both in terms of lifetime and population sparseness (figs. 4, 5).
We also test a variant of MNIST called Fashion-MNIST [36], which uses the same format but consists of small images of items of clothing like shoes and Figure 3: The network learns pen-stroke shapes from MNIST digits. A: sample input stimuli. Black corresponds to zero and white to one. B: receptive fields (weights) of a network with 256 neurons after training on 120,000 digits (28 × 28 pixels) with random distortions. Middle gray corresponds to zero, lighter pixels to excitatory weights, and darker pixels to inhibitory weights.
shirts. Training the network on that dataset extracts the outlines of the input stimuli and also separates some of their constituent parts ( fig. 6). [38]. Images are typically not presented to the network in their raw form, but first processed either by a difference-of-Gaussians filter that models the transformations happening in the retina, or by a whitening transform that equalises the variance across spatial frequencies [39]. Both types of pre-  Field [1], the NASA dataset yields sligtly more elongated receptive fields but gives otherwise similar results. This is probably due to the more frequent occurence of straight edges in indoor scenes.

Linear decoding
The next series of experiments aims to check whether the network's output is indeed a good encoding of the input. This does not necessarily follow from an analysis of the receptive fields; for instance, a network could succeed in extracting individual independent components, but still fail to encode the mixture of components present in any given input. More specifically, we would like to check whether the sparse encoding produced by the network can be linearly    After that classification task, we turn to linear regression and attempt to reconstruct natural images from the output of the network. While Zylberberg et al. [20] inverted the transformation manually by reusing the network's encoding weights for decoding, here we train a linear model to predict the input patch given the sparse output of the network. We did not attempt to quantify the reconstruction error: pixel-wise measures such as the peak signal-to-noise ratio are neither very informative of how much structure is preserved, nor easy to interpret when comparing different scenes, and better metrics based on structural similarity are non-trivial to compute [41]. Qualitatively, we find that even a small network with 64 neurons preserves the general features of the scene ( fig. 9), despite reducing the dimensionality of the data by a factor of 4. Larger networks are able to encode finer details -the larger text on the sample image starts to be legible with 256 sparse coding neurons.

Stability and response to perturbations
In most machine learning experiments, the input data is randomised so that its distribution is mostly homogeneous over time. This is not the case for embodied agents that learn continuously: an animal samples from small regions of the input space as it moves from one place or activity to the next. Thus an important challenge in artificial neural networks is to learn online on non-homogeneous data. Sparse coding networks with a homeostatic term make an explicit assumption that the average firing rate of each neuron is constant, and the violation of that assumption could be a factor in catastrophic forgetting. The next experiment aims to explore whether the absence of a homeostatic term in our model makes it more robust to perturbations.
In fig. 10, we first train the network on the full MNIST dataset with Gaussian noise ( = 0.2) added to the digits and clipped to [0, 1]. After 150000 stimuli, we remove the MNIST input and continue training on the background noise. We restore the input and train again on the full MNIST dataset for 150000 stimuli. Finally, we perform one last training round on a subset of MNIST that contains only the zeroes, with all other digits removed.
We find that the receptive fields retain their selectivity despite fading during the period when the network receives only background noise, and recover with minimal changes when the original input is restored ( fig. 11): thanks to the lack of fast IP, input deprivation does not induce catastrophic forgetting. As long as the distribution of the independent components remains the same, there is  A and C in fig. 11). In contrast, we observed a constant shifting of the receptive fields when replicating other models such as the one by Zylberberg et al. [20]. However, the receptive fields do change rapidly when we switch from the full MNIST to zeroes only: they adapt to match the new distribution of the independent components and forget the features that were specific to other digits (such as straight lines). Thus the lack of IP protects against forgetting during input deprivation but does not block continual adaptation to the input, as long as the new stimuli overlap with existing receptive fields. A small number of neurons (typically one or two) respond strongly to the noise during the period of input deprivation (bright receptive fields in fig. 11; dark lines in figs. 10, 12). Average firing rates for the other cells are low; again, this can be explained by the absence of a homeostatic term that would drive every neuron towards a target firing rate. Since the background noise does not contain any structure, these few active cells are sufficient to encode it and inhibit other neurons, protecting their receptive fields.

also no drift with continued learning (compare
The transient increase in activity when the input is restored does not exceed three times the baseline: spikes remain sparse throughout, and come back to normal after 10 seconds ( fig. 12). Since the neurons have fixed somatic and dendritic thresholds, that increase must come from the decay of lateral inhibition or from a shift in the excitatory/inhibitory balance of the feedforward weights. In contrast, in a network with IP, homeostatic adjustment of the thresholds to the background noise would cause a temporary saturation of the transfer function and loss of sparseness when the input is restored. Figure 12: The network is robust to input deprivation. Top: mean number of spikes per neuron per stimulus; the green line marks the average value before mark A. Bottom: raster plot of the output spikes. Following input deprivation (mark A), firing patterns return to normal within 20 seconds after the input is restored (mark B), except for the neurons that responded strongly to the noise (these take somewhat longer).

Dendritic learning and compartmentalised inputs
Biological neurons are more than just thresholding devices [42]: they can perform computations that are significantly more complex than the traditional artificial neurons with a single summation stage feeding into a sigmoidal or rectified linear (ReLU) transfer function. These include temporal processing at various levels from the synapses to the soma, and multi-stage integration of inputs in the dendrititic tree [43,44], achieving capabilities in individual cells that would normally require a network of point neurons.
With this paper, we give one example of the types of learning that become possible in neurons with separate compartments. Point neurons are capable of sparse coding, but they must approach the problem of from the angle of lifetime sparseness. Our contribution is to show how the addition of a dendritic compartment gives the learning rule access to more information that lets it modulate nonlinear Hebbian learning via population sparseness as well. [30] could not explore, as they used simpler stimuli made of a single independent component. [31] to reward-modulated learning with multiple dendrites. One area of future research is therefore to apply our sparse coding rule to neurons with more than two compartments, for instance for the purpose of learning sparse codes from a bottom-up input and predictive associations from top-down sources at the same time.

A single dendritic compartment is still a stark simplification over the finely branched structure of dendritic trees. Legenstein & Maass [45] use multiple dendrites to solve a nonlinear binding task where each neuron learns to respond to multiple patterns (for instance AB and CD), while ignoring other combinations of the same input dimensions (AC and BD); Hawkins & Ahmad [46] exploit a similar mechanism. As for Schiess et al. [47], they extend the somatodendritic learning rule of Urbanczik & Senn
The wave of interest in modelling neural networks with compartmentalised inputs now extends to the neuromorphic hardware that can simulate them efficiently, with some experimental support for dendrites on the SpiNNaker chip [48], and multiple compartments and input traces on Intel's Loihi [49]. As the idea makes its way from biology to machine learning [46,50], we expect to see a shift from a paradigm where artificial neural networks employ very simple units and rely on supervised learning to distribute a task over this generic computational substrate, to a paradigm where single neurons perform a substantial amount of computation and where the structure of these neurons already encodes a particular approach towards solving the task.

Learning sparse codes without intrinsic plasticity
Our findings confirm that intrinsic plasticity is not strictly required to learn sparse codes. In addition to its role in decorrelating the population responses, plastic lateral inhibition can also regulate sparseness through its effect on the nonlinear Hebbian learning rule. Freed from the need to provide that fast negative feedback, IP could instead act on timescales slower than Hebbian plasticity, and help recruit previously silent dendrites.
Without fast IP, the network can be made more robust to temporary input deprivation. Because of their selective receptive fields, the neurons respond only weakly to background noise, and the post-synaptic gating in the learning rule protects the synaptic weights from rapid changes.
But replacing intrinsic plasticity with synaptic plasticity is still not enough to cope with the changes that an animal or robot would encounter as it switches between tasks and environments: the network remains susceptible to rapid and extensive reorganisation when novel inputs overlap the existing receptive fields, or when the distribution of the independent components changes. On the one hand, that kind of adaptability is desirable as natural environments are not static and the quick acquisition of novel stimuli can be critical for survival. But on the other hand, it should disturb existing receptive fields as little as possible so as not to erase previous experiences and all the associations that build upon them.
Although increasing sparseness and careful tuning of learning rates could help, it is likely that solving that stability-plasticity dilemma will require ad-hoc gating mechanisms. Some candidates are the conditional consolidation of synaptic changes [51], neuromodulation and attention [52,53], or a mechanism based on top-down prediction errors like the Adaptive Resonance Theory [54].

On sparse coding and associative readouts
In Buzsáki's perspective [3], every pathway that links two populations of cells involves a readout or transformation of one neural code into another. But in machine learning, complex transformations often require multiple layers. Does the brain use interneurons for that purpose, or does it somehow solve the problem without them? Although there are excitatory interneurons in the cortex (layer IV stellate cells), we know that inputs from distal areas converge directly onto the dendrites of pyramidal cells [55], which favours the direct readout hypothesis. That architecture would also scale better to larger networks.
Compared to dense codes (where every cell participates in coding every stimulus), there are two reasons why sparse codes should make direct readouts easier to learn. First, because fewer active units mean fewer weights to tune for any given mapping -in that sense, sparseness could act as a form of regularisation that prevents overfitting. Second, because separating the independent components of the signal should also help to disentangle the factors of variation, and make the problem more linearly tractable.
Compared to a local code (where every stimulus has its own dedicated cells), sparse codes should allow the readouts to generalise to novel inputs while retaining the ability to encode small differences. Instead of encoding each stimulus as a whole (as happens in nearest-neighbour clustering, self-organising maps and strict winner-take-all networks), a sparse coding network encodes each stimulus as a combination of features that it shares with other stimuli. It is thus able to respond to the familiar features of an unfamiliar input.
Our results indicate that the use of sparse codes can help, allowing a simple linear readout to reach the same accuracy as a multi-layer network trained on the raw input. This suggests that one could learn transformations from one sparse code to another sparse code by adding just an extra set of synapses to the target neurons. However, the decoding tasks we performed in this paper are not necessarily representative of the kind of readouts that a neural system embodied in an animal or robot needs to perform: in the case of MNIST, the target classes are few and mutually exclusive; and in the case of natural images, the output space is the same as the input space.
It would therefore be of value to test the idea with more realistic types of inputs and readout -for instance, encoding sensory and motor information and learning predictive associations between one modality and another. And the lesser improvement in linear decoding on the Fashion-MNIST variant shows that detecting a complex arrangement between the parts of a sparse code may still require more than a single linear readout -not just an extra set of synapses, but an extra set of compartments as well, as in Legenstein & Maass [45], leveraging dendritic arithmetics to solve the task [44].

Biophysical interpretation
In certain aspects, the architecture of our network ressembles cortical networks. Pyramidal cells are also the convergence point of distal inputs and local recurrent pathways, and their somas receive targeted inhibition from parvalbumincontaining basket cells that are activated by neighbouring neurons [56]. This suggests that the cortex might be making a similar use of compartmentalised inhibition for the purpose of learning sparse codes.
However, given the diversity of cortical inhibitory pathways [57], the roles of compartmentalised inhibition in cortical neurons must be considerably more complex than our model can possibly account for. For instance, other types of inhibitory interneurons bring recurrent inhibition to the dendrites [58,59]; one could attempt to include these in a computational model, following up on the work of Spratling & Johnson [60]. Wilmes et al. [61] also modelled the mechanism postulated by Körding & König [30], where inhibition does not suppress somatic spiking, but blocks the backpropagating action potentials on their way to the dendrites -a mechanism which could also sustain a sparse coding rule.
As it stands, our somato-dendritic learning rule (eqs. 1 and 2) contains a number of hypotheses about plasticity in biological neurons. Loosely speaking, the term corresponds to backpropagating action potentials (bAPs), while the term signals dendritic activity. This implies that somatic spikes should lead to long-term potentiation (LTP) of active excitatory synapses, while dendritic activity should cause depression (LTD). For dendritic inhibitory synapses, the situation should be reversed, with dendritic activity leading to LTP and bAPs leading to LTD.
There is some evidence in support of the first hypothesis: bAPs are a classical trigger of LTP, while dendritic spikes [62,63] and NMDA receptor activation [64,65] have both been linked to LTD or blockage of LTP. But there is also evidence to the contrary: NMDAR activation is central to LTP as well [64,66,67], and dendritic spikes can induce LTP without bAPs [68].
Nonetheless it seems that dendritic LTP without bAPs requires a local sodium spikelet [69,70]: a fast, spike-like depolarization that sometimes, but not always, accompanies NMDA spikes [71]. This suggests rephrasing our biophysical interpretation and equating the term with fast voltage transients -either bAPs or sodium spikelets -while the term would correspond to slower dendritic events like elevated calcium. The question is then what sort of conditions can trigger sodium spikelets in the absence of bAPs, and how the operation of the learning rule would change if these were included in the model. [72] bring some support to the notion that plasticity at dendritic inhibitory synapses could be reversed com-pared to excitatory synapses. They report LTD of inhibitory inputs coincident with bAPs, and LTP for those that come up to 800 ms after the train of action potentials. The latter fact also hints at a dimension of temporal processing which we ignore in this model and which we could explore in further workfor instance adapting the learning rule to learn transitions and sequences.

As for inhibitory plasticity, Holmgren & Zilberter
More generally, these questions call for further electrophysiological investigations of somato-dendritic plasticity rules. There would be much to learn from experiments that vary dendritic and somatic activity independently -controlling the number of somatic action potentials emitted during a dendritic spike, or the relative timing of somatic and dendritic events.

Somatic compartments and somatic synapses
The somatic compartments are standard LIF neurons. The membrane potential follows the following equation: where and are the currents from the dendrite and somatic synapses, respectively.
We use a fixed spiking threshold and after-spike reset without a refractory period: We compute a firing rate that takes into account the number of spikes and also their latency relative to the stimulus onset 0 . First we define a trace that increases after each spike and decays exponentially: Then we normalise so that the area under the curve is the number of spikes, and integrate over the stimulus window: Thus a spike that occurs towards the end of the window contributes less to the total than a spike that occurs early. That reset does not seem to be critical for our findings, but we did not explore the issue further.

Dendritic compartments
Dendrites are rate-and current-based. The net dendritic input and dendritic activation for each neuron post are as follows: The initial weights are drawn from a normal distribution (std = 0.01).
The current from the dendrite to the soma is a nonlinear function of the dendritic activation: Here the goal is to reproduce the active properties of biological dendrites. Above a certain input threshold, regenerative activation of the NMDA receptors causes dendritic spikes. These lead to a sharp increase in membrane potential followed by a plateau where stronger inputs cause no further increase in voltage [73,74]. We model this with a step function and the offset 0 . However, stronger inputs do increase the duration, and reduce the rise time of the plateau, producing more somatic spikes. We model this with the linear term . In practice we adjust 0 to cancel out the somatic rheobase, so that suprathreshold dendritic activation elicits at least one spike in the absence of somatic inhibition. We find that is not critical for our findings as long as all dendrites respond to some inputs at the start of the simulation. Here we set it to zero, although a small positive value would better reproduce the data in Milojkovic et al. [73] and Oikonomou et al. [74]. Table 3

Receptive Fields
Throughout this paper we use the weights of the neurons as a proxy for their actual receptive fields. Showing all the weights of the network on the same image requires that we normalise each receptive field separately, because neurons that respond to narrow features have larger absolute weights than those that respond to broad ones. Nonetheless, we make sure that zero weights appear as the same middle gray for all neurons, allowing quick identification of ON (brighter) and OFF (darker) areas. Thus we normalise the receptive field = [ 1→ ⋯ → ] of each neuron as follows when generating the figures: where = (max 1≤ ≤ | → |) −1 and is the number of input dimensions.

MNIST
We use both the standard MNIST dataset [35] and the Fashion-MNIST variant [36], each with 60,000 training samples and 10,000 test samples. We map the full range of the data to the interval [0, 1]. When training the sparse coding network, we shuffle the patterns and distort them with random shears and translations, as done in LeCun et al. [40]. The purpose of these distortions is to increase the number of distinct training samples, and also to remove the correlations introduced by the centering of the patterns. We do this by applying the following affine transformation with the origin at the center of the pattern: where each is a random variable drawn from ( = 0.1), and each is a random variable drawn from ( = 2.0). Distorted digits produce more localised receptive fields than the centered patterns, which in turn improves the performance of classifiers trained on the output of the network. When training and testing the classifiers themselves, we freeze the weights of the sparse coding network and we use the plain stimuli without distortions.

Natural images
We use two datasets of photographic images: one by Olshausen & Field, which consists of natural outdoors scenes [76], and one compiled from public-domain images by NASA [37], which can be found in the supporting information of this paper (S1 File). In both cases, each image was converted to grayscale, resized to an area of 200,000 pixels, preprocessed using the same whitening transform as Olshausen & Field [1], and then normalised to unit variance. No further normalisation was applied to the individual patches used for training; in particular, the patch mean was not subtracted from the input. Note that in contrast to MNIST the natural image stimuli contain both positive and negative values. We interpret these as ON and OFF channels from the retina; while it would be more realistic to split the ON and OFF values into separate, non-negative channels, we did not attempt this here.
For the reconstruction experiment, the input image was tiled into overlapping patches with a width of 16 pixels and a stride of 8 pixels. Each input patch was run through a sparse coding network pre-trained on the NASA dataset. The sparse output was then fed as the input to a linear model trained with ridge regression to recontruct the original patches. Finally, the predicted patches were placed at their original locations and averaged to account for the stride overlap.