Sparse coding and dimensionality reduction in cortex

Supported by recent computational studies, sparse coding and dimensionality reduction are emerging as a ubiquitous coding strategy across brain regions and modalities, allowing neurons to achieve nonnegative sparse coding (NSC) by efficiently encoding high-dimensional stimulus spaces using a sparse and parts-based population code. Reducing the dimensionality of complex, multimodal sensory streams is critically important for metabolically constrained brain areas to represent the world. In this article, we provide an overview of NSC, summarize evidence for its role in neural computation in disparate regions of the brain, ranging from visual processing to spatial navigation, and speculate that specific forms of synaptic plasticity and homeostatic modulation may underlie its implementation. We suggest that NSC may be an organizing principle in the nervous system.


A need for compressed representations of a high-dimensional world
Brains face the fundamental challenge of processing, storing, and representing highdimensional stimuli using patterns of neural activity. This challenge is complicated by strict constraints on metabolic cost [1] and the widespread existence of anatomical bottlenecks [2][3][4][5], which often force the information stored in a large number of neurons to be compressed into an orders of magnitude smaller population of downstream neurons; for example, storing information from 100 million photoreceptors in 1 million optic nerve fibers, or resulting in a 10 − 10, 000 fold convergence from cortex to the basal ganglia [5].
One potential approach to addressing this challenge is to reduce the number of variables required to represent a particular stimulus space; a process known as dimensionality reduction (see Glossary). This idea features prominently in efficient coding theories of brain function [6][7][8][9], which posit that the brain performs dimensionality reduction by maximizing mutual information between the high-dimensional input and the low-dimensional output of neuronal populations. Several modern variants of the efficient coding hypothesis, such as independent component analysis (ICA) [10] and nonnegative sparse coding (NSC) [11,12], suggest that the role of cortical representations is to further reduce the redundancy of the sensory signal by separating it into its underlying causes, a process known as factor analysis.
Here we propose that a variety of neuronal responses can be understood as an emergent property of efficient population coding based on dimensionality reduction and sparse coding. Specifically, computational evidence from data analyses and computer simulations suggest that NSC, a combination of nonnegative matrix factorization (NMF) [13,14] and sparse population coding [15] (Box 1), generates low-dimensional embeddings of high-dimensional stimulus spaces that resemble neuronal population responses in a variety of brain regions. Reminiscent of basis function representations [16][17][18], these embeddings are both sparse and parts-based, allowing for perceptual or behavioral variables of interest to be represented by taking a linear combination of 2 neuronal responses. Furthermore, we suggest that these computations might be implemented in the brain through mechanisms of synaptic plasticity and homeostasis, which were previously shown to be capable of performing statistical inference on synaptic inputs [19][20][21][22].
Overall, our observations suggest that commonly reported neuronal response properties might simply be a by-product of neurons performing a biological equivalent of dimensionality reduction on their inputs via Hebbian-like learning mechanisms.
Nonnegative sparse coding as a modern variant of the efficient coding hypothesis Early theories of efficient coding [6,23] were developed based on the visual system.
Realizing that the set of natural scenes was much smaller than the set of all possible images, it was argued that the visual system should not waste resources on processing arbitrary images. Instead, the nervous system should use statistical knowledge about its environment to represent the relevant input space as economically as possible.
Modern renditions of this theory (such as sparse coding [24] and ICA [10]) have refined the original hypotheses by tying neuronal response properties to the statistics of the natural environment (for a review, see [25]). These theories were shaped by two fundamental empirical observations about early visual cortex: 1) a neuron's receptive field (RF) resembled a decomposition of the visual stimulus into a series of local, largely independent feature components (e.g., a 2-D Gabor function is basically a local approximation of the directional spatial derivative of an image), and 2) any individual neuron responded only sparsely to a small subset of stimulus features (e.g., orientation or color at a particular spatial location). Thus a neuron's RF could be understood as a sparse, low-dimensional embedding of high-dimensional input stimuli. Such a representation is desirable from an energy-expenditure point of view, since it allows the visual system to represent any visual stimulus by activating only a small set of neurons, while most neurons in the population remain silent.
Olshausen and Field [24] found that linear sparse coding of natural images yielded features qualitatively similar to RFs of simple cells in primary visual cortex (V1), thus giving empirically observed RFs an information-theoretic explanation. However, as pointed out by Hoyer [26], sparse coding falls short of providing a literal interpretation for V1 simple-cell behavior for two reasons: 1) every neuron could be either positively or negatively active, and 2) the input to the neural network was typically double-signed, whereas V1 neurons receive visual input from the lateral geniculate nucleus (LGN) in the form of separated, nonnegative ON and OFF channels.
Hoyer [11,26] proposed NSC as a way to transform Olshausen and Field's sparse coding from a relatively abstract model of image representation into a biologically plausible model of early visual cortex processing by enforcing both input signal and neuronal activation to be nonnegative. This seemingly simple change had remarkable consequences on the quality of the sensory representation: Whereas elementary image features in the standard sparse coding model could "cancel each other out" through subtractive interactions, enforcing nonnegativity ensured that features combined additively, much like the intuitive notion of combining parts to form a whole. The resulting parts-based representations resembled RFs in V1 more closely than other holistic representations, and would later be shown to apply to other brain regions as well, suggesting that parts-based representations are employed throughout cortex.
Inhibitory connections can be modeled in the same fashion, by interpreting inhibitory weights as nonnegative synaptic conductances, which not only preserves the parts-based quality of the encoding, but also allows for more complicated connection types to be modeled (e.g., V1 neurons receiving input from both excitatory ON and inhibitory OFF cells in the LGN) [26]. However, it is interesting to note that a more recent study has argued that the nonnegativity constraint on the synaptic weights might not be necessary to preserve the parts-based quality of the encoding [27].
Empirical evidence for nonnegative sparse coding in the brain Reconstructing stimulus spaces using sparse, parts-based representations An influential paper by Lee and Seung [14] found that applying a linear dimensionality reduction technique known as NMF to a database of face images yielded sparse, localized features that resembled parts of a face. NMF is a statistical matrix decomposition technique that takes an F × S data matrix V whose rows correspond to distinct features of the input (e.g., F different pixels of an image) and whose columns correspond to different stimuli or observations of those features (e.g., S different images).
Matrix V must be nonnegative, and NMF decomposes the matrix into two reducedrank matrices whose linear combination can be weighted such that the product of W and H provides an accurate reconstruction of V (Fig. 1).
On its surface, NMF would appear to be unrelated to the mechanisms underlying artificial or biological neural networks; however, it was Lee and Seung's intuitive mapping of these variables onto a neural network that forms the cornerstone of our conception of encoding variables or "basis images". Such a representation is reminiscent to neural processing in inferotemporal cortex (IT), an area in the ventral visual "what" stream involved in encoding high-level object identity [28,29], where images of whole faces can be linearly reconstructed using responses of approximately 200 neurons that each respond to certain sets of physical facial features [30].
Remarkably, such a parts-based representation is not specific to information processing in IT; the same principle can be extended to other areas of the visual system, such as the dorsal subregion of the medial superior temporal area (MSTd), which is 6 part of the visual motion pathway [31]. Neurons in MSTd respond to relatively large and complex patterns of retinal motion ("optic flow"), owing to input from direction and speed selective neurons in the middle temporal area (MT) (for a recent review, see [32]). Although MSTd had long been suspected to be involved in the analysis of self-motion, the complexity of neuronal response properties has made it difficult to experimentally investigate how neurons in MSTd might perform this function. However, when Beyeler and colleagues [31] applied NMF to MT-like patterns of activity, they found a sparse, parts-based representation of retinal flow (Fig. 1B2) similar to the parts-based representation of faces encountered by Lee and Seung [14]. The resulting "basis flow fields" showed a remarkable resemblance to receptive fields of MSTd neurons, as they preferred an intricate mixture of 3D translational and rotational flow components in a subset of the visual field. As a result, any flow field possibly to be encountered during self-movement through a 3D environment could be represented by only B = 64 MSTd neurons, as compared to F = 9, 000 simulated MT input neurons.
This led to an efficient and sparse population code, where any given stimulus could be represented by only a small number of simulated MSTd neurons (population sparsity), and any given model neuron was activated by only a small number of stimuli (lifetime sparsity) [31].
Analogously, NSC can explain response properties of neurons outside the visual system, such as in the retrosplenial cortex (RSC), an area important for navigation and spatial memory [33][34][35]. Neurons in the RSC conjunctively encode multiple variables related to the environment and one's position and movement within it, allowing the representation of spatial features of the environment with respect to multiple reference frames [36], as determined by an experiment in which rats ran back and forth on a W-shaped track that occupied two different spatial locations within the room in each recording session to test sensitivity to the allocentric reference frame (i.e., track positions α and β). Outbound and inbound runs were made up of opposite turn sequences (left-right-left (LRL) and right-left-right (RLR), respectively) that corresponded to different sets of trials, which allowed assessment of sensitivity to the egocentric and route-7 based reference frames. However, establishing a mechanistic link between physiological response properties of RSC neurons and their underlying representations of space has proved difficult, due to the complexity of their response properties and because inputs to the region are not easily isolated.
We applied NMF to recorded data from the original RSC experiment conducted by Alexander and Nitz [36]. During the experiment, activity from 228 RSC neurons was recorded along with four behavioral metrics: linear velocity, angular velocity, head direction, and allocentric position. Using Gaussian and cosine tuning curves, we created idealized input neurons that encoded these four variables. By applying NMF to the idealized neuronal output, we were able to replicate functionality observed in the biological RSC (Fig. 1B3). Once again, the dimensionality was reduced from a set of F = 417 input neurons to a set of B = 30 basis functions.
Although there seems to be a consensus that information-theoretic explanations are relevant when investigating the early visual system, higher-order brain areas are often considered to be specialized for performing tasks (e.g., recognizing objects, making decisions, navigating an environment), rather than the efficient encoding of information.
However, the finding that NSC could be used to explain neuronal responses in the visual and retrosplenial cortices introduces the possibility that it might apply elsewhere in the brain. In fact, sparse (and potentially parts-based) representations have been observed in olfactory, auditory, somatosensory, and motor cortices (Table 1). This introduces the possibility that NSC might in fact be a general principle to which neuronal computations adhere. Table 1: Nonnegative sparse coding in the brain. We list a group of brain regions for which there is experimental evidence of certain features associated with NSC ('X': evidence exists, '?': has yet to be investigated). For each brain region, the left-hand side of the table lists experimental evidence for sparse and/or parts-based representations, whereas the right-hand side lists computational support that NSC can describe receptive fields or response properties within that region. Auditory cortex X ? [45] ? ?
Hippocampus X ? [52] ? ? Understanding neuronal response properties within the framework of NSC Neuronal response properties can also be computed for model neurons with NSC, which means that model neurons can be evaluated using methods similar to the ones employed by experimental researchers to understand biological neurons, and by theoretical neuroscientists to understand computational models. This is important because it means that NSC can be used to model neural activity in the brain, and the resulting activity patterns generated by NSC can be compared to and evaluated against experimental findings.
Neuronal response patterns can be computed by reconstructing the activity of a population of "input" neurons (i.e., the s-th column in V) from a population of "output" neurons (i.e., the s-th column in H) via their synaptic weights (i.e., the matrix W; response properties. For example, in the visual system, an NMF-based model was able to reconstruct neuronal spike trains in the salamander retina [37]. Following testing on ground truth data, the researchers recorded spikes from in vitro retinal ganglion cells while the cells were exposed to natural scenes (either still photographs or videos).
They then applied several factorization methods to the data. Space-by-time NMF could decompose the data into separate spatial and temporal modules that yielded sparser and more compact representations compared to other techniques, including orthogonal Tucker-2 and basic NMF. NSC could be used to reproduce response properties of V1 simple and complex cells [26,38] as well as V2 hypercomplex cells [54]. Outside the visual stream, a model known as Reinforcement-Driven Dimensionality Reduction (RDDR) successfully used Hebbian learning to reproduce basal ganglia response pat-13 terns associated with reward [50], a function associated with cortico-striato-pallidal circuitry. The authors later applied nonnegativity constraints to the Hebbian learning in the model so that it performed NMF on its inputs. The model advanced understanding of the cortico-striato-pallidal loop by capturing behavior of the circuit while explaining the existence of convergent and lateral connections in the region that other models have historically ignored [3]. The authors suggest that the basal ganglia uses unsupervised, reward-driven learning to perform dimensionality reduction on cortical inputs for the efficient compression of information in order to plan upcoming actions in the frontal cortex.
The finding that NSC can account for neuronal response properties across brain areas and modalities supports the view that these preferences emerge from the pressure to find efficient representations of perceptually or behaviorally relevant stimulus spaces. At the population level, NSC promotes representations in which neurons act as generalists rather than specialists, allowing for the simultaneous encoding of multiple variables of interest (e.g., heading, eye rotation velocity in MSTd [31]) with respect to multiple frames of reference (e.g., egocentric, route-based in RSC [55]). Among the advantages of such basis function representations [16][17][18] (also called mixed-selectivity representations [56][57][58]) are robustness to noise as well as the ability to decode various variables of interest through a linear combination of neuronal responses.

Approximating NMF with STDPH
That NSC can explain and reproduce response properties observed in biological neurons may be an important clue as to how brains have evolved to parse and store information. One candidate NSC process in the brain is synaptic plasticity which, similar to NSC, may act on synapses for the purpose of reducing dimensionality on an input space in order to represent it efficiently and sparsely. Specifically, we propose that NSC might be functionally equivalent to spike-timing dependent plasticity and homeostatic synaptic scaling (STDPH) (Box 2). Spike-timing dependent plasticity (STDP) is a spike-timing based correlative learning rule in which the timing of the preand post-synaptic spikes determine the amplitude and sign of synaptic weight changes (i.e., long-term depression or long-term potentiation) [59,60]. Homeostasis acts to keep neuronal activity in a good working range, acting across synapses of a single neuron and across multiple neurons [61]. Whereas STDP acts over a relatively short time scale (i.e., milliseconds to seconds), homeostasis operates over minutes to days. Homeostasis also facilitates synaptic competition by normalizing the inputs to a neuron. This evens the playing field for synapses that weaken due to imbalances in activity, but might otherwise strengthen if left to their own devices [62].
Experimental and theoretical evidence suggest that biological processes, such as synaptic plasticity and homeostatic mechanisms, may be reducing the dimensionality of inputs in a similar way to NMF. For example, Carlson and colleagues [20] delivered a mathematical proof that STDPH can approximate the NMF algorithm (see Box 2).
Similar to Oja's rule [22], which was developed to stabilize rate-based Hebbian learning (effectively resulting in principal component analysis (PCA)), synaptic scaling acts as a homeostatic mechanism to stabilize STDP (effectively resulting in NMF). This finding suggests that neurons are able to find accurate factorial representations of their input stimulus spaces through means of Hebbian-like synaptic learning. In addition, sparsity of the encoding might be enforced by spike thresholding [63] and lateral inhibitory connections [64].
To investigate this equivalence in a more realistic setting, we conducted simulations in which a model constructed using STDPH and a model constructed using NSC were applied to a dataset of recorded neuronal activity observed in the RSC of rats in a spatial navigation task (as shown for NSC in Figure 1B3). In the STDPH experiment, the idealized input neurons generated spike trains as input to a population of SNNs that were trained using an evolutionary strategy (Box 3) to match the recorded electrophysiological data [55]. In the NSC experiment, the idealized neuronal output was used to construct a matrix of training data associated with each of the four recorded 'features' (linear velocity, angular velocity, head direction, and position), to which NMF was applied.
We found that the activity patterns of both NSC and STDPH model neurons could replicate the neuronal response properties and ensemble activity seen in the electrophysiologically recorded neurons in the dataset (Fig. 3A, B, C, left); that is, the model neuron activity could be classified into three broad categories, with remarkably similar population statistics to rat RSC [36]: 1) Turn-sensitive, no modulation neurons responded whenever the animal made a left or right turn on the track (light gray); 2) Turn-sensitive, route modulation neurons responded whenever the animal made a turn on a specific position along the route, independent of allocentric location (dark gray); and 3) Turn-insensitive neurons that could not be classified according to the above, but nonetheless exhibited complex and robust firing patterns (white). In addition, both NSC and STDPH model RSC produced simulated neurons whose ensemble activity vectors could be used to predict the agent's location on a route with respect to the allocentric frame of reference. Ensemble activity patterns could also disambiguate the agent's location within routes that occupied different locations in the room, consistent with findings of population behavior in the biological RSC [36]. When even and odd trials on the same track locations were compared, ensemble prediction error was very low, but when the tracks were in two different locations (i.e., α vs. β), the prediction error was significantly higher in all cases (Fig. 3A, B, C, right). For further details on this analysis, see [36,55].
These findings lend credence to our proposal that NMF and STDPH are function-16 ally equivalent. Furthermore, the matrix W produced by NMF when applied to the RSC dataset was qualitatively similar to the synaptic weight matrices produced by simulations in which STDPH was evolved to match the same dataset. B) Simulated using NMF. C) Simulated by evolving STDPH parameters to fit experimental data from [55].

Concluding remarks and future directions
In this article, we have argued that a variety of neuronal response properties can be understood as an emergent property of efficient population coding based on dimensionality reduction and sparse coding. We offer three testable predictions of this theory: First, we predict that parts-based representations can explain RFs of neurons in a variety of brain regions, including but not limited to those brain areas discussed here (i.e., V1, MSTd and RSC). In agreement with the literature on basis function representations [16][17][18], we expect parts-based representations to be prevalent in regions where neurons exhibit a range of tuning behaviors [31], display mixed selectivity [56,57], or encode information in multiple reference frames [36,55].
Second, where such representations occur, we expect the resulting neuronal population activity to be sparse, in order to encode information both accurately and efficiently.
Sparse codes offer a trade-off between dense codes (where every neuron is involved in every context, leading to great memory capacity but suffering from cross talk among neurons) and local codes (where there is no interference, but also no capacity for generalization) [65].
Third, we propose that STDPH is carrying out a similar function to NMF, and may be attempting to approximate the linear RF of neurons participating in sparse, partsbased representations throughout the brain. STDPH and NMF can effectively produce NSC. With the emergence of computational tools developed to understand the neural code in high stimulus dimensions [66], we expect to see qualitative similarities between empirically observed RFs and those recovered by NMF and STDPH. Such findings would be consistent with the idea that neurons can perform statistical inference on their inputs via Hebbian-like learning mechanisms [19][20][21][22].
In summary, we suggest that the evidence reviewed in this paper may indicate that dimensionality reduction through nonnegative sparse coding, implemented by synaptic plasticity rules, may be a canonical computation throughout the brain.

Box 1: Nonnegative sparse coding (NSC)
Nonnegative sparse coding (NSC) combines nonnegative matrix factorization (NMF), a linear dimensionality reduction technique from statistical learning, with sparse population coding from neural network theory [11,67]. NMF belongs to a class of methods that can be used to decompose a multivariate data matrix V into an inner product of two reduced-rand matrices W and H. NMF assumes that the observed data in V are The goal of NSC is then to find a linear decomposition of V that minimizes the reconstruction error, while guaranteeing that both W and H are sparse. This can be achieved by minimizing the following cost function [11]:  [59,60]. If pre-synaptic spike counts integrated over a critical window precede those of the post-synaptic neuron, then long-term potentiation (LTP) is induced, strengthening the weight. Otherwise, long-term depression (LTD) is induced, suppressing the weight. STDP is ubiquitous throughout the brain across the lifespan and can take on different forms [68,69]. Simulations of STDP can result in runaway feedforward excitation such that produces unrealistic firing rates. To obviate this problem in biology, the brain uses homeostatic mechanisms that modulate synaptic inputs and neuronal firing thresholds [70].
We focus on synaptic scaling, a particular form of homeostasis which multiplicatively scales weights up or down depending on the average firing rate of the postsynaptic neuron, which has been demonstrated to stabilize STDP [20,71,72]. We refer to this combination of STDP and homeostatic synaptic scaling as STDPH.
We have shown that, given a network with fewer output than input neurons and full connectivity from the input layer to the output layer, STDPH iteratively acts to preserve the information in the first layer with the output layer neurons. Given an input layer of neurons (represented by matrix V) connected to a single output layer neuron (represented by row vector h T ) we have proven that the STDPH rule iteratively updates the synaptic weights in w in such a way as to minimize the reconstruction error of |V − w h T |, and that the NMF update rule is mathematically equivalent to the STDPH update rule [20].
To induce sparsity in networks of neurons, there must be a mechanism that causes each output layer to learn a different component of the total information represented by the input layer [73], which may be achieved by competition among units. Lateral or feedforward inhibition is often used to induce competition in SNNs, since it offers a biologically plausible mechanism to implement a winner-take-all (WTA) architecture [64], although other groups have suggested that thresholding can also achieve sparse codes [63]. Therefore, NSC can be attained in a two-layer network with STDPH and a WTA architecture via inhibitory connections within or between layers. Interestingly, popular implementations of NMF (e.g., MATLAB, Scikit-Learn) include a constraint on the L1 norm of H that automatically lead to sparsity. Alternatively, an explicit sparsity level can be incorporated into models of NMF, such as in Hoyer's model [12]. as parents for the next generation, which can undergo crossover (i.e., where a child inherits parameters from its parents whose values have been swapped at random) and/or mutation (i.e., where a child inherits parameters from its parents with values randomly chosen from a distribution of values) to produce a new generation of individuals. This process continues until the fitness of the population no longer improves.
Carlson and colleagues [74] developed an automated tuning framework that incorporates a plugin to CARLsim [75] for running evolutionary algorithms using the ECJ library [76]. The plugin is used to pass information about the SNN population (e.g., number of generations to run, parameters to be evolved and their ranges, mutation and crossover rates) back and forth between CARLsim and ECJ (Fig. 4). The networks in the population are created, trained, tested, and evaluated using CARLsim. The re-sulting fitness scores for each individual in the population are then handed off to ECJ, which determines the parameters for the next generation according to the selection criteria defined in the parameter file, and the cycle repeats.
In the experiments in which SNNs were used to replicate neurophysiological data, the evolutionary algorithm was used to optimize STDPH parameters in the model, corresponding to the temporal window over which spike times were integrated, the amount each synaptic weight increased or decreased, the initial weight range of each group of synaptic connections, and the target baseline firing rates of excitatory and inhibitory neurons. During each generation of the evolutionary run, models were trained on trials randomly selected from half of the available dataset, then tested on trials randomly selected from the latter half reserved for testing. At the end of each generation, synthetic neural activity was correlated with electrophysiologically recorded neural activity, which served as the fitness metric for the algorithm. After the model finished an evolutionary run, SNN activity closely matched the electrophysiological activity.
a. Figure 4: CARLsim-ECJ pipeline for evolving SNNs using ECJ (adapted from [74]). ECJ is used to initialize the parameters being tuned, which are passed to CARLsim and used to set the parameters of each network in the population of SNNs (red arrow). The SNNs are initialized, run, and evaluated using CARLsim. Following fitness evaluation using some user-defined criteria, the resulting fitness values for each individual in the population are sent to ECJ (black arrow). ECJ chooses parents from the population and performs selection, replication, and mutation on the parameters of the parent individuals in order to initialize the next generation of SNNs. This continues for a user-defined number of runs or until convergence.

Glossary
• Basis functions A lower-dimensional set of linearly independent elements that can represent a high-dimensional input space given a weighted sum of these elements, where the weight of each element is defined by a separate hidden component.
• Dimensionality reduction The process of reducing the dimensionality of a space to the lowest possible space that encapsulates the variance of the original data via feature extraction. In the case of neuronal firing rate patterns, this means representing all possible firing rate patterns in the brain region using the smallest possible subset of the neurons.
• Factor analysis A statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved, uncorrelated variables called factors (or latent variables).
• Receptive field The structure and boundaries of an individual neuron's pattern of response to various kinds of incoming stimuli.
• Spike-timing dependent plasticity A Hebbian-inspired learning rule in which weight updates are computed based on the precise spike times of pre-and postsynaptic neurons that induce either long term potentiation or long term depression in the synapse, depending on whether the total pre-synaptic spike count preceded the total post-synaptic spike count, integrated over a critical temporal window.